I want to limit the number of pages scraped to 5 with below code although the website has 50 pages. I'm using Scrapy's CrawlSpider. How can I do that?
我想限制的页数刮到5以下的代码,尽管该网站有50页。我用的是Scrapy的爬虫蜘蛛。我怎么能做到这一点?
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
class BooksSpider(CrawlSpider):
name = "bookscraper"
allowed_domains = ["books.toscrape.com"]
start_urls = ["https://books.toscrape.com/"]
rules = (Rule(LinkExtractor(restrict_xpaths='//h3/a'), callback='parse_item', follow=True),
Rule(LinkExtractor(restrict_xpaths='//li[@class="next"]/a'), follow=True),)
def parse_item(self, response):
product_info = response.xpath('//table[contains(@class, "table-striped")]')
name = response.xpath('//h1/text()').get()
upc = product_info.xpath('(./tr/td)[1]/text()').get()
price = product_info.xpath('(./tr/td)[3]/text()').get()
availability = product_info.xpath('(./tr/td)[6]/text()').get()
yield {'Name': name, 'UPC': upc, 'Availability': availability, 'Price': price}
更多回答
优秀答案推荐
更多回答
Well I have already tried that in my spider but still it seems that the spider is only scraping the first page
嗯,我已经在我的蜘蛛身上试过了,但蜘蛛似乎只刮了第一页
@BilalAnees check my answer again
@BilalAnees再次检查我的答案
我是一名优秀的程序员,十分优秀!