gpt4 book ai didi

python - scrapy错误: Error processing {'image_urls' :

转载 作者:行者123 更新时间:2023-12-01 00:46:29 25 4
gpt4 key购买 nike

我正在设置一个简单的蜘蛛来从 xkcd 下载图像,这是我当前拥有的代码:

蜘蛛:

import scrapy
from scrapy.loader import ItemLoader

from test_im.items import TestImItem

class SpiderSpider(scrapy.Spider):
name = 'spider_'
allowed_domains = ['xkcd.com/']
start_urls = ['http://xkcd.com//']

def parse(self, response):
test_item = TestImItem()
relative_url = response.xpath('//*[@id="comic"]//@src').extract_first()
image_urls = (response.urljoin(relative_url) )
print (image_urls)
test_item['image_urls'] = image_urls
yield test_item

项目:

import scrapy

class TestImItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
image_urls = scrapy.Field()

设置:

ITEM_PIPELINES = {'scrapy.pipelines.images.ImagesPipeline': 1}
IMAGES_STORE = '/home/luis/Documentos/proyectos/test_im/test_im/images/'

我收到以下错误:

2019-07-08 21:25:13 [scrapy.core.scraper] ERROR: Error processing {'image_urls': 'https://imgs.xkcd.com/comics/trained_a_neural_net.png'}
Traceback (most recent call last):
File "/home/luis/anaconda3/lib/python3.7/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/home/luis/anaconda3/lib/python3.7/site-packages/scrapy/pipelines/media.py", line 79, in process_item
requests = arg_to_iter(self.get_media_requests(item, info))
File "/home/luis/anaconda3/lib/python3.7/site-packages/scrapy/pipelines/images.py", line 155, in get_media_requests
return [Request(x) for x in item.get(self.images_urls_field, [])]
File "/home/luis/anaconda3/lib/python3.7/site-packages/scrapy/pipelines/images.py", line 155, in <listcomp>
return [Request(x) for x in item.get(self.images_urls_field, [])]
File "/home/luis/anaconda3/lib/python3.7/site-packages/scrapy/http/request/__init__.py", line 25, in __init__
self._set_url(url)
File "/home/luis/anaconda3/lib/python3.7/site-packages/scrapy/http/request/__init__.py", line 62, in _set_url
raise ValueError('Missing scheme in request url: %s' % self._url)
ValueError: Missing scheme in request url: h

据我了解,“ValueError:请求网址中缺少方案:h”意味着图像网址错误。

但我可以在浏览器中打开它,没有任何问题。

'image_urls':'https://imgs.xkcd.com/comics/trained_a_neural_net.png '

最佳答案

Scrapy 尝试将您的字符串处理为图像网址列表:

test_item['image_urls'] = [image_urls]

关于python - scrapy错误: Error processing {'image_urls' :,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56944287/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com