gpt4 book ai didi

python - Scrapy 如何从多个页面抓取项目?

转载 作者:行者123 更新时间:2023-12-01 00:24:02 26 4
gpt4 key购买 nike

我正在尝试抓取 # 页的数据。我已经做了一个抓取器,可以从单个页面抓取数据。但它在抓取第一页后突然完成了工作

带有解析函数和 scrapd 函数的整个文件 - Scraper.py

# -*- coding: utf-8 -*-
import scrapy
import csv
import os
from scrapy.selector import Selector
from scrapy import Request

class Proddduct(scrapy.Item):
price = scrapy.Field()
description = scrapy.Field()
link = scrapy.Field()
content = scrapy.Field()


class LapadaScraperSpider(scrapy.Spider):
name = 'lapada_scraper2'
allowed_domains = ['http://www.lapada.org']
start_urls = ['https://lapada.org/art-and-antiques/?search=antique']

def parse(self, response):
next_page_url = response.xpath("//ul/li[@class='next']//a/@href").get()

for item in self.scrape(response):
yield item

if next_page_url:
print("Found url: {}".format(next_page_url))
yield scrapy.Request(url=next_page_url, callback=self.parse)

def scrape(self, response):
parser = scrapy.Selector(response)

products = parser.xpath("//div[@class='content']")

for product in products:
item = Proddduct()
XPATH_PRODUCT_DESCRIPTION = ".//strong/text()"
XPATH_PRODUCT_PRICE = ".//div[@class='price']/text()"
XPATH_PRODUCT_LINK = ".//a/@href"

raw_product_description = product.xpath(XPATH_PRODUCT_DESCRIPTION).extract()
raw_product_price = product.xpath(XPATH_PRODUCT_PRICE).extract()
raw_product_link = product.xpath(XPATH_PRODUCT_LINK).extract_first()

item['description'] = raw_product_description
item['price'] = raw_product_price
item['link'] = raw_product_link

yield item

def get_information(self, response):
item = response.meta['item']
item['phonenumber'] = "12345"
yield item

如何抓取所有页面中的所有项目?

谢谢

最佳答案

allowed_domains = ['http://www.lapada.org'] 更改为 allowed_domains = ['lapada.org']

关于python - Scrapy 如何从多个页面抓取项目?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58745597/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com