gpt4 book ai didi

python - 尝试使用 scrapy 解析页面时获取

转载 作者:太空宇宙 更新时间:2023-11-04 04:27:44 26 4
gpt4 key购买 nike

当我尝试获取所有页面内容时,我在控制台中收到此错误

  2018-11-08 20:55:34 [scrapy.core.engine] INFO: Spider opened
2018-11-08 20:55:34 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-08 20:55:34 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2018-11-08 20:55:34 [scrapy.core.engine] ERROR: Error while obtaining start requests
Traceback (most recent call last):
File "c:\python36\lib\site-packages\scrapy\core\engine.py", line 127, in _next_request
request = next(slot.start_requests)
File "c:\python36\lib\site-packages\scrapy\spiders\__init__.py", line 83, in start_requests
yield Request(url, dont_filter=True)
File "c:\python36\lib\site-packages\scrapy\http\request\__init__.py", line 25, in __init__
self._set_url(url)
File "c:\python36\lib\site-packages\scrapy\http\request\__init__.py", line 62, in _set_url
raise ValueError('Missing scheme in request url: %s' % self._url)

我的代码是这样的

import scrapy

class Shopee(scrapy.Spider):

name = 'Shopee'
start_urls = ['http://www.shopee.sg/Games-Hobbies-cat.14']


def parse(self, response):
print(response.text)

最佳答案

您帖子中的错误消息与 start_urls 中缺少 http(s):// 有关。我想你在更新代码时忘记更新错误消息。

但在运行您的代码后,该站点似乎正在根据 User-Agent 阻止客户端。考虑尝试浏览器的用户代理字符串。例如:

name = 'Shopee'
start_urls = ['http://www.shopee.sg/Games-Hobbies-cat.14']
custom_settings = {
'DEFAULT_REQUEST_HEADERS': {
'User-Agent': (
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14'
' (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A'
)
}
}

关于python - 尝试使用 scrapy 解析页面时获取,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53214316/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com