python - Scrapy ImagesPipeline 不下载图像-6ren

python - Scrapy ImagesPipeline 不下载图像

转载作者：行者123 更新时间：2023-11-28 18:32:12

25

4

我在 python 中运行一个 Scrapy 蜘蛛来从网站上抓取图像。在尝试了一些其他方法之后，我试图实现一个 ImagesPipeline 来执行此操作。

items.py

class NHTSAItem(scrapy.Item):
    image_urls = scrapy.Field()
    images = scrapy.Field()

settings.py:

ITEM_PIPELINES = {'scrapy.pipelines.images.ImagesPipeline': 1}
IMAGES_STORE = 'C:\Users\me\Desktop'

myspider.py

def parse_photo_page(self, response):
    item = NHTSAItem()
    for sel in response.xpath('//table[@id="tblData"]/tr'):
        url = sel.xpath('td/font/a/@href').extract()
        table_fields = sel.xpath('td/font/text()').extract()
        if url:
            base_url_photo = "http://www-nrd.nhtsa.dot.gov"
            full_url = base_url_photo + url[0]
            if not item:
                item['image_urls'] = [full_url]
            else: 
                item['image_urls'].append(full_url)
    return item

没有出现错误，只是图片没有下载。调试器甚至说“Scraped” 这是日志:

DEBUG: Scraped from <200 http://www-nrd.nhtsa.dot.gov/database/VSR/veh/../SearchMedia.aspx?database=v&tstno=4000&mediatype=p&p_tstno=4000>
{'image_urls': [u'http://www-nrd.nhtsa.dot.gov/database/MEDIA/GetMedia.aspx?tstno=4000&index=1&database=V&type=P',
            u'http://www-nrd.nhtsa.dot.gov/database/MEDIA/GetMedia.aspx?tstno=4000&index=2&database=V&type=P',
            u'http://www-nrd.nhtsa.dot.gov/database/MEDIA/GetMedia.aspx?tstno=4000&index=3&database=V&type=P',
            u'http://www-nrd.nhtsa.dot.gov/database/MEDIA/GetMedia.aspx?tstno=4000&index=4&database=V&type=P',
            u'http://www-nrd.nhtsa.dot.gov/database/MEDIA/GetMedia.aspx?tstno=4000&index=5&database=V&type=P']}

我不关心扩展管道(制作自定义管道)，默认图像管道很好。这些图像无处可寻。任何想法我做错了什么？

最佳答案

这是我从这个平行问题中得到的解决方案:Scrapy: Error 10054 after retrying image download (感谢@neverlastn)

我只是将这段代码添加到我实际的 spider.py 文件中。

custom_settings = { "ITEM_PIPELINES": {'scrapy.pipelines.images.ImagesPipeline': 1}, “IMAGES_STORE”:保存位置 }

我认为它没有正确引用我的 settings.py 文件，因此没有激活图像管道。我不确定如何让它准确引用我的设置文件，但这个解决方案对我来说已经足够好了!

关于python - Scrapy ImagesPipeline 不下载图像，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/35873790/

25

4

0

文章推荐： python - Flask with Celery - worker 以 exitcode1 退出

文章推荐： html - Twitter Bootstrap 2 列图像网格

文章推荐： python - 创建时与命名空间的Python多处理池交互

文章推荐： python 工作日和 strftime ("%U")

python - Scrapy ImagesPipeline 不下载图像
我在 python 中运行一个 Scrapy 蜘蛛来从网站上抓取图像。在尝试了一些其他方法之后，我试图实现一个 ImagesPipeline 来执行此操作。 items.py class NHTSAI
python - 修改 Scrapy ImagesPipeline 上的缓存 header
默认情况下，Scrapy 为使用 ImagesPipeline 保存的所有图像设置 2 天(172800 秒)Cache-Control header 。我想将该值更新为 2592000，即 30 天
Python + Scrapy : Issues running "ImagesPipeline" when running crawler from script
我是 Python 新手，所以如果这里有一个愚蠢的错误，我深表歉意...我已经在网络上搜索了好几天，查看类似的问题并梳理 Scrapy 文档，但似乎没有什么能真正为我解决这个问题... 我有一个 Sc

首页

博学

6Ren·AI

商城

python - Scrapy ImagesPipeline 不下载图像