gpt4 book ai didi

python - 为什么 scrapy 在尝试抓取和解析网站时会为我抛出错误?

转载 作者:太空狗 更新时间:2023-10-29 22:11:54 25 4
gpt4 key购买 nike

下面的代码

class SiteSpider(BaseSpider):
name = "some_site.com"
allowed_domains = ["some_site.com"]
start_urls = [
"some_site.com/something/another/PRODUCT-CATEGORY1_10652_-1__85667",
]
rules = (
Rule(SgmlLinkExtractor(allow=('some_site.com/something/another/PRODUCT-CATEGORY_(.*)', ))),

# Extract links matching 'item.php' and parse them with the spider's method parse_item
Rule(SgmlLinkExtractor(allow=('some_site.com/something/another/PRODUCT-DETAIL(.*)', )), callback="parse_item"),
)
def parse_item(self, response):
.... parse stuff

抛出以下错误

Traceback (most recent call last):
File "/usr/lib/python2.6/dist-packages/twisted/internet/base.py", line 1174, in mainLoop
self.runUntilCurrent()
File "/usr/lib/python2.6/dist-packages/twisted/internet/base.py", line 796, in runUntilCurrent
call.func(*call.args, **call.kw)
File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 318, in callback
self._startRunCallbacks(result)
File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 424, in _startRunCallbacks
self._runCallbacks()
--- <exception caught here> ---
File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 441, in _runCallbacks
self.result = callback(self.result, *args, **kw)
File "/usr/lib/pymodules/python2.6/scrapy/spider.py", line 62, in parse
raise NotImplementedError
exceptions.NotImplementedError:

当我将回调更改为“parse”并将函数更改为“parse”时,我没有收到任何错误,但没有任何内容被删除。我将其更改为“parse_items”,认为我可能会覆盖 parse method by accident .也许我错误地设置了链接提取器?

我想做的是解析 CATEGORY 页面上的每个 ITEM 链接。我这样做完全错了吗?

最佳答案

我需要将 BaseSpider 更改为 CrawlSpider。感谢 srapy 用户!

http://groups.google.com/group/scrapy-users/browse_thread/thread/4adaba51f7bcd0af#

Hi Bob,

Perhaps it might work if you change from BaseSpider to CrawlSpider? The BaseSpider seems not implement Rule, see:

http://doc.scrapy.org/topics/spiders.html?highlight=rule#scrapy.contr...

-M

关于python - 为什么 scrapy 在尝试抓取和解析网站时会为我抛出错误?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/5264829/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com