python - 如何在 urllib.urlretrieve 中捕获 404 错误-6ren

python - 如何在 urllib.urlretrieve 中捕获 404 错误

转载作者：IT老高更新时间：2023-10-28 22:18:12

背景:我正在使用 urllib.urlretrieve ，与 urllib* 模块中的任何其他函数相反，因为 Hook 函数支持(参见下面的 reporthook).. 用于显示文本进度条。这是 Python >=2.6。

>>> urllib.urlretrieve(url[, filename[, reporthook[, data]]])

但是，urlretrieve 太笨了，以至于无法检测 HTTP 请求的状态(例如:是 404 还是 200？)。

>>> fn, h = urllib.urlretrieve('http://google.com/foo/bar')
>>> h.items() 
[('date', 'Thu, 20 Aug 2009 20:07:40 GMT'),
 ('expires', '-1'),
 ('content-type', 'text/html; charset=ISO-8859-1'),
 ('server', 'gws'),
 ('cache-control', 'private, max-age=0')]
>>> h.status
''
>>>

下载具有类似钩子(Hook)的支持(显示进度条)和良好的 HTTP 错误处理的远程 HTTP 文件的最知名方法是什么？

最佳答案

查看urllib.urlretrieve的完整代码:

def urlretrieve(url, filename=None, reporthook=None, data=None):
  global _urlopener
  if not _urlopener:
    _urlopener = FancyURLopener()
  return _urlopener.retrieve(url, filename, reporthook, data)

换句话说，您可以使用 urllib.FancyURLopener (它是公共(public) urllib API 的一部分)。您可以覆盖 http_error_default 以检测 404:

class MyURLopener(urllib.FancyURLopener):
  def http_error_default(self, url, fp, errcode, errmsg, headers):
    # handle errors the way you'd like to

fn, h = MyURLopener().retrieve(url, reporthook=my_report_hook)

关于python - 如何在 urllib.urlretrieve 中捕获 404 错误，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/1308542/