gpt4 book ai didi

python - urllib2 根据 URL 检索任意文件并将其保存到命名文件中

转载 作者:太空宇宙 更新时间:2023-11-04 05:58:42 26 4
gpt4 key购买 nike

我正在编写一个 python 脚本来使用 urllib2 模块作为命令行实用程序 wget 的等效项。我想要的唯一功能是它可用于根据 URL 检索任意文件并将其保存到命名文件中。我还只需要担心两个命令行参数,即要从中下载文件的 URL 和要保存内容的文件的名称。

例子:

python Prog7.py www.python.org pythonHomePage.html

这是我的代码:

import urllib
import urllib2
#import requests

url = 'http://www.python.org/pythonHomePage.html'

print "downloading with urllib"
urllib.urlretrieve(url, "code.txt")

print "downloading with urllib2"
f = urllib2.urlopen(url)
data = f.read()
with open("code2.txt", "wb") as code:
code.write(data)

urllib 似乎有效,但 urllib2 似乎无效。

收到的错误:

 File "Problem7.py", line 11, in <module>
f = urllib2.urlopen(url)
File "/usr/lib64/python2.6/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib64/python2.6/urllib2.py", line 397, in open
response = meth(req, response)
File "/usr/lib64/python2.6/urllib2.py", line 510, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib64/python2.6/urllib2.py", line 429, in error
result = self._call_chain(*args)
File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain
result = func(*args)
File "/usr/lib64/python2.6/urllib2.py", line 616, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "/usr/lib64/python2.6/urllib2.py", line 397, in open
response = meth(req, response)
File "/usr/lib64/python2.6/urllib2.py", line 510, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib64/python2.6/urllib2.py", line 435, in error
return self._call_chain(*args)
File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain
result = func(*args)
File "/usr/lib64/python2.6/urllib2.py", line 518, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 404: NOT FOUND

最佳答案

并且该 URL 根本不存在; https://www.python.org/pythonHomePage.html确实是一个 404 Not Found 页面。

urlliburllib2 的区别在于,后者在返回 404 页面时会自动引发异常,而 urllib.urlretrieve() 只是为您保存了错误页面:

>>> import urllib
>>> urllib.urlopen('https://www.python.org/pythonHomePage.html').getcode()
404
>>> import urllib2
>>> urllib2.urlopen('https://www.python.org/pythonHomePage.html')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/urllib2.py", line 410, in open
response = meth(req, response)
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/urllib2.py", line 523, in http_response
'http', request, response, code, msg, hdrs)
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/urllib2.py", line 448, in error
return self._call_chain(*args)
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/urllib2.py", line 531, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 404: NOT FOUND

如果您想保存错误页面,您可以捕获 urllib2.HTTPError exception :

try:
f = urllib2.urlopen(url)
data = f.read()
except urllib2.HTTPError as err:
data = err.read()

关于python - urllib2 根据 URL 检索任意文件并将其保存到命名文件中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26336521/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com