gpt4 book ai didi

How can I limit the number of pages scraped with Scrapy CrawlSpider?(我如何才能限制CrawlSpider抓取的页面数量?)

转载 作者:bug小助手 更新时间:2023-10-25 17:13:03 26 4
gpt4 key购买 nike



I want to limit the number of pages scraped to 5 with below code although the website has 50 pages. I'm using Scrapy's CrawlSpider. How can I do that?

我想限制的页数刮到5以下的代码,尽管该网站有50页。我用的是Scrapy的爬虫蜘蛛。我怎么能做到这一点?


from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule


class BooksSpider(CrawlSpider):
name = "bookscraper"
allowed_domains = ["books.toscrape.com"]
start_urls = ["https://books.toscrape.com/"]

rules = (Rule(LinkExtractor(restrict_xpaths='//h3/a'), callback='parse_item', follow=True),
Rule(LinkExtractor(restrict_xpaths='//li[@class="next"]/a'), follow=True),)

def parse_item(self, response):

product_info = response.xpath('//table[contains(@class, "table-striped")]')

name = response.xpath('//h1/text()').get()
upc = product_info.xpath('(./tr/td)[1]/text()').get()
price = product_info.xpath('(./tr/td)[3]/text()').get()
availability = product_info.xpath('(./tr/td)[6]/text()').get()

yield {'Name': name, 'UPC': upc, 'Availability': availability, 'Price': price}

更多回答
优秀答案推荐


更多回答

Well I have already tried that in my spider but still it seems that the spider is only scraping the first page

嗯,我已经在我的蜘蛛身上试过了,但蜘蛛似乎只刮了第一页

@BilalAnees check my answer again

@BilalAnees再次检查我的答案

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com