gpt4 book ai didi

python - 尝试抓取蜘蛛时出现错误(NotImplementedError)

转载 作者:行者123 更新时间:2023-12-03 08:07:34 26 4
gpt4 key购买 nike

我的Scrapy代码无效。我正在尝试抓取论坛,但收到错误。
这是我的代码:

import scrapy, time

class ForumSpiderSpider(scrapy.Spider):
name = 'forum_spider'
allowed_domains = ['visforvoltage.org/latest_tech/']
start_urls = ['http://visforvoltage.org/latest_tech//']

def parse_urls(self, response):
for href in response.css(r"tbody a[href*='/forum/']::attr(href)").extract():
url = response.urljoin(href)
print(url)
req = scrapy.Request(url, callback=self.parse_data)
time.sleep(10)
yield req

def parse_data(self, response):
for sel in response.css('html').extract():
data = {}
data['name'] = response.css(r"div[class='author-pane-line author-name'] span[class='username']::text").extract()
data['date'] = response.css(r"div[class='forum-posted-on']:contains('-') ::text").extract()
data['title'] = response.css(r"div[class='section'] h1[class='title']::text").extract()
data['body'] = response.css(r"div[class='field-items'] p::text").extract()
yield data


next_page = response.css(r"li[class='pager-next'] a[href*='page=']::attr(href)").extract()
if next_page:
yield scrapy.Request(
response.urljoin(next_page),
callback=self.parse_urls)
这是一个错误:
[scrapy.core.scraper] ERROR: Spider error processing <GET https://visforvoltage.org/latest_tech> (referer: None)
raise NotImplementedError('{}.parse callback is not defined'.format(self.__class__.__name__))
NotImplementedError: ForumSpiderSpider.parse callback is not defined
如果有人可以帮助我,我将不胜感激!

最佳答案

父类scrapy.Spider具有一个称为start_requests的方法。该方法将检查您的start_urls并为蜘蛛创建第一个请求。
该方法希望您有一个称为parse的方法作为回调函数。因此,解决问题的最快方法是将parse_urls方法更改为parse,如下所示:

def parse(self, response):
for href in response.css(r"tbody a[href*='/forum/']::attr(href)").extract():
url = response.urljoin(href)
print(url)
req = scrapy.Request(url, callback=self.parse_data)
time.sleep(10)
yield req
如果要更改该行为,则需要覆盖类中的 start_requests方法,以便可以确定回调函数的名称。 例如:
def start_requests(self):
for url in self.start_urls:
yield Request(url, callback=self.parse_urls, dont_filter=True)

关于python - 尝试抓取蜘蛛时出现错误(NotImplementedError),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63063388/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com