python - scrapy 错误 : exceptions. IOError:无法识别图像文件-6ren

python - scrapy 错误 : exceptions. IOError:无法识别图像文件

转载作者：行者123 更新时间：2023-11-28 16:47:47

26

4

我在不知道图像文件名或跟踪它的响应 url 的情况下多次收到以下错误:

2012-08-20 08:14:34+0000 [spider] Unhandled Error
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 545, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 362, in callback
    self._startRunCallbacks(result)
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 458, in _startRunCallbacks
    self._runCallbacks()
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 545, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
--- <exception caught here> ---
  File "/usr/lib/pymodules/python2.7/scrapy/contrib/pipeline/images.py", line 204, in media_downloaded
    checksum = self.image_downloaded(response, request, info)
  File "/usr/lib/pymodules/python2.7/scrapy/contrib/pipeline/images.py", line 252, in image_downloaded
    for key, image, buf in self.get_images(response, request, info):
  File "/usr/lib/pymodules/python2.7/scrapy/contrib/pipeline/images.py", line 261, in get_images
    orig_image = Image.open(StringIO(response.body))
  File "/usr/lib/python2.7/dist-packages/PIL/Image.py", line 1980, in open
    raise IOError("cannot identify image file")
exceptions.IOError: cannot identify image file

那么，我该如何解决这个问题呢？因为它在我已经在 settings.py 中定义的特定数量的错误之后停止了我的蜘蛛

最佳答案

违规行使用 PIL 到 scrapy.contrib.pipelines.images.ImagesPipeline 中的 Image.open():

def get_images(self, response, request, info):
    key = self.image_key(request.url)
    orig_image = Image.open(StringIO(response.body))

media_downloaded() 中的 try block 捕获了这个但它自己发出错误:

except Exception:
    log.err(spider=info.spider)

你可以破解这个文件:

try:
    key = self.image_key(request.url)
    checksum = self.image_downloaded(response, request, info)
except ImageException, ex:
    log.msg(str(ex), level=log.WARNING, spider=info.spider)
    raise
except IOError, ex:
    log.msg(str(ex), level=log.WARNING, spider=info.spider)
    raise ImageException
except Exception:
    log.err(spider=info.spider)
    raise ImageException

但更好的选择是创建自己的管道并覆盖 pipelines.py 文件中的 image_downloaded() 方法:

from scrapy import log
from scrapy.contrib.pipeline.images import ImagesPipeline

class BkamImagesPipeline(ImagesPipeline):

    def image_downloaded(self, response, request, info):
        try:
            super(BkamImagesPipeline, self).image_downloaded(response, request, info)
        except IOError, ex:
            log.msg(str(ex), level=log.WARNING, spider=info.spider)

请务必在您的设置文件中声明此管道:

ITEM_PIPELINES = [
    'bkam.pipelines.BkamImagesPipeline',
]

关于python - scrapy 错误 : exceptions. IOError:无法识别图像文件，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/12044488/

26

4

0

文章推荐： Python 元类默认属性

文章推荐： javascript - 从表中选择一个单元格并为其指定唯一的类

文章推荐： python - python 中超过 50k 个条目的哈希函数和表实现

python - IOError : [Errno 2] - Can permissions cause an IOError Errno 2 when using open()
我有一个 python 脚本，它创建一个 tar 文件，将文件移动到 tar 文件中，然后删除它们。我可以毫无问题地手动运行脚本。但是当它从 cron 运行时，它失败了: IOError: [Err
open() gives FileNotFoundError / IOError: '[Errno 2] No such file or directory'(Open()给出FileNotFoundError/ioError：‘[Errno 2]没有这样的文件或目录’)
我正在尝试从我的Python脚本打开文件recentlyUpdated.yaml。但当我尝试使用时：。我收到一个错误，内容是：。为什么？我怎样才能解决这个问题？
open() gives FileNotFoundError / IOError: '[Errno 2] No such file or directory'(Open()给出FileNotFoundError/ioError：‘[Errno 2]没有这样的文件或目录’)
我正在尝试从我的Python脚本打开文件recentlyUpdated.yaml。但当我尝试使用时：。我收到一个错误，内容是：。为什么？我怎样才能解决这个问题？
open() gives FileNotFoundError / IOError: '[Errno 2] No such file or directory'(Open()给出FileNotFoundError/ioError：‘[Errno 2]没有这样的文件或目录’)
我正在尝试从我的Python脚本打开文件recentlyUpdated.yaml。但当我尝试使用时：。我收到一个错误，内容是：。为什么？我怎样才能解决这个问题？
django - IOError:请求数据读取错误
尝试将数据加载为 Excel 作为响应时，出现 IO 请求数据读取错误。 def convert_to_excel(request): field = forms.CharField()
Python - 运行文件夹树时出现 IOError
我正在尝试读取文件夹树中的一系列 DICOM 文件，并且我使用下面的代码来运行树，边读取每个文件。问题是我收到确实存在的文件的 IOErrors，我已经检查了文件权限和其他 SO 线程，例如 Pyth
python - 尝试除了不从类中捕获 IOError
我有一个类可以读取特定格式的文件。这些文件的大小往往大于 8Gb，因此通常会进行压缩。在读取文件时，我想捕获文件未被压缩的错误，但 except IOError: 和 except: 都不会这样做，出
python - 尝试打开现有文件时出现 IOError
这个问题在这里已经有了答案: open() gives FileNotFoundError / IOError: '[Errno 2] No such file or directory' (8 个
python - IOError 异常处理的单元测试
给定这段代码: try: #do something except IOError as message: logging.error(message)
python - 帮助读取文件的 IOError
for subdir, dirs, files in os.walk(crawlFolder): for file in files: print os.getcwd()
Python:IOError:调用中缺少参数
我正在尝试导入模块并创建其类的对象，如下所示: >>> import scriptsim >>> scriptsim.Simulator() 但出现以下错误: Traceback (most rece
python - 生产服务器上的 IOError
我正在使用以下第三方库在 Django 中编写网络应用程序: Django==1.6.1 argparse==1.2.1 cffi==0.8.1 pycparser==2.10 pylast==0.5
Python CronTab IOError
我正在使用 python-crontab module确保每天下午 2 点运行我的脚本。但是我在执行脚本时遇到了一些 IOErrors Traceback: File "backup.py", li
python - 写入文件时出现 IOError
我正在编写一个程序来更改我的桌面背景。它通过读取文本文件来完成此操作。如果文本文件显示其中一个 BG 文件名，它会将那个保存为我的背景，并将另一个的名称写入文件并关闭它。我似乎无法让它工作。这是我
java - 什么时候可能会抛出 IOError？
我从未见过 IOError被抛出。文档中关于 IOError 的唯一内容是: Thrown when a serious I/O error has occurred. 没有任何子类或其他明显的东西。
python - IOError : [Errno 13]
我正在尝试下载链接并将其放置在downloads文件夹中，但是出现权限错误。我是计算机上的管理员用户，我也以管理员模式运行它。仍然我得到同样的错误。这是我使用的代码: urllib.urlretri
haskell - 如何将 IOError 异常与本地相关异常结合起来？
我正在构建一个 Haskell 应用程序，并试图弄清楚如何构建错误处理机制。在实际的应用程序中，我正在使用 Mongo 进行大量工作。但是，为此，我将通过对文件进行基本 IO 操作来进行简化。因此，
rabbitmq - Celery 在重试任务时抛出 IOError
当我尝试重试失败的任务时，我会间歇性地(大约 20% 的时间)从 Celery 收到 IOError 异常。这是我的任务: @task def update_data(pk_id): tr
python - 在 IOError 上打开不同的文件
我编写了一个 python 脚本，想要将日志写入/var/log/myapp.log 中的文件。然而，在某些平台上这并不存在，或者我们可能没有这样做的权限。既然如此，我想尝试在其他地方写。 def g
python - 无法弄清楚导致 IOError 的原因是什么
我已经使用 getopts 编写了一个脚本来接受四个用户输入项(两个输入文件和两个输出文件)。但由于某种原因，我不断收到此错误: python2.7 compare_files.py -b /tmp/

首页

博学

6Ren·AI

商城

python - scrapy 错误 : exceptions. IOError:无法识别图像文件