gpt4 book ai didi

python - scrapy 错误 : exceptions. IOError:无法识别图像文件

转载 作者:行者123 更新时间:2023-11-28 16:47:47 26 4
gpt4 key购买 nike

我在不知道图像文件名或跟踪它的响应 url 的情况下多次收到以下错误:

2012-08-20 08:14:34+0000 [spider] Unhandled Error
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 545, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 362, in callback
self._startRunCallbacks(result)
File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 458, in _startRunCallbacks
self._runCallbacks()
File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 545, in _runCallbacks
current.result = callback(current.result, *args, **kw)
--- <exception caught here> ---
File "/usr/lib/pymodules/python2.7/scrapy/contrib/pipeline/images.py", line 204, in media_downloaded
checksum = self.image_downloaded(response, request, info)
File "/usr/lib/pymodules/python2.7/scrapy/contrib/pipeline/images.py", line 252, in image_downloaded
for key, image, buf in self.get_images(response, request, info):
File "/usr/lib/pymodules/python2.7/scrapy/contrib/pipeline/images.py", line 261, in get_images
orig_image = Image.open(StringIO(response.body))
File "/usr/lib/python2.7/dist-packages/PIL/Image.py", line 1980, in open
raise IOError("cannot identify image file")
exceptions.IOError: cannot identify image file

那么,我该如何解决这个问题呢?因为它在我已经在 settings.py 中定义的特定数量的错误之后停止了我的蜘蛛

最佳答案

违规行使用 PIL 到 scrapy.contrib.pipelines.images.ImagesPipeline 中的 Image.open():

def get_images(self, response, request, info):
key = self.image_key(request.url)
orig_image = Image.open(StringIO(response.body))

media_downloaded() 中的 try block 捕获了这个但它自己发出错误:

except Exception:
log.err(spider=info.spider)

你可以破解这个文件:

try:
key = self.image_key(request.url)
checksum = self.image_downloaded(response, request, info)
except ImageException, ex:
log.msg(str(ex), level=log.WARNING, spider=info.spider)
raise
except IOError, ex:
log.msg(str(ex), level=log.WARNING, spider=info.spider)
raise ImageException
except Exception:
log.err(spider=info.spider)
raise ImageException

但更好的选择是创建自己的管道并覆盖 pipelines.py 文件中的 image_downloaded() 方法:

from scrapy import log
from scrapy.contrib.pipeline.images import ImagesPipeline

class BkamImagesPipeline(ImagesPipeline):

def image_downloaded(self, response, request, info):
try:
super(BkamImagesPipeline, self).image_downloaded(response, request, info)
except IOError, ex:
log.msg(str(ex), level=log.WARNING, spider=info.spider)

请务必在您的设置文件中声明此管道:

ITEM_PIPELINES = [
'bkam.pipelines.BkamImagesPipeline',
]

关于python - scrapy 错误 : exceptions. IOError:无法识别图像文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12044488/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com