gpt4 book ai didi

python - Scrapy 分页在多个列表上失败

转载 作者:太空宇宙 更新时间:2023-11-03 21:47:30 25 4
gpt4 key购买 nike

我正在尝试使用 scrapy 抓取网站。当我抓取特定页面时,分页抓取有效,但当我尝试一次跳转抓取所有页面时,分页不起作用。
我尝试为分页创建一个额外的函数,但这并不能解决问题。所有帮助将不胜感激。我究竟做错了什么 ?这是我的代码:

# -*- coding: utf-8 -*-
import scrapy

from scrapy.loader.processors import MapCompose, Join
from scrapy.loader import ItemLoader
from scrapy.http import Request

from avtogumi.items import AvtogumiItem


class BasicSpider(scrapy.Spider):
name = 'gumi'
allowed_domains = ['avtogumi.bg']
start_urls = ['https://bg.avtogumi.bg/oscommerce/index.php' ]

def parse(self, response):

urls = response.xpath('//div[@class="brands"]//a/@href').extract()
for url in urls:
url = response.urljoin(url)
yield scrapy.Request(url=url, callback=self.parse_params)


def parse_params(self, response):

l = ItemLoader(item=AvtogumiItem(), response=response)

l.add_xpath('title', '//h4/a/text()')
l.add_xpath('subtitle', '//p[@class="ft-darkgray"]/text()')
l.add_xpath('price', '//span[@class="promo-price"]/text()',
MapCompose(str.strip, str.title))
l.add_xpath('stock', '//div[@class="product-box-stock"]//span/text()')
l.add_xpath('category', '//div[@class="labels hidden-md hidden-lg"][0]//text()')
l.add_xpath('brand', '//h4[@class="brand-header"][0]//text()',
MapCompose(str.strip, str.title))
l.add_xpath('img_path', '//div/img[@class="prod-imglist"]/@src')

yield l.load_item()

next_page_url = response.xpath('//li/a[@class="next"]/@href').extract_first()
if next_page_url:
next_page_url = response.urljoin(next_page_url)
yield scrapy.Request(url=next_page_url, callback=self.parse_params)

最佳答案

这里的问题是这样的:

l = ItemLoader(item=AvtogumiItem(), response=response)

l.add_xpath('title', '//h4/a/text()')
l.add_xpath('subtitle', '//p[@class="ft-darkgray"]/text()')
l.add_xpath('price', '//span[@class="promo-price"]/text()',
MapCompose(str.strip, str.title))
l.add_xpath('stock', '//div[@class="product-box-stock"]//span/text()')
l.add_xpath('category', '//div[@class="labels hidden-md hidden-lg"][0]//text()')
l.add_xpath('brand', '//h4[@class="brand-header"][0]//text()',
MapCompose(str.strip, str.title))
l.add_xpath('img_path', '//div/img[@class="prod-imglist"]/@src')

yield l.load_item()

这段代码将解析并加载一个结果。如果您的页面包含多个结果,则必须将此代码放入 for 循环中,并迭代您要解析的所有搜索结果:

objects = response.xpath('my_selector_here')
for object in objects:
l = ItemLoader(item=AvtogumiItem(), response=response)

l.add_xpath('title', '//h4/a/text()')
l.add_xpath('subtitle', '//p[@class="ft-darkgray"]/text()')
l.add_xpath('price', '//span[@class="promo-price"]/text()',
MapCompose(str.strip, str.title))
l.add_xpath('stock', '//div[@class="product-box-stock"]//span/text()')
l.add_xpath('category', '//div[@class="labels hidden-md hidden-lg"][0]//text()')
l.add_xpath('brand', '//h4[@class="brand-header"][0]//text()',
MapCompose(str.strip, str.title))
l.add_xpath('img_path', '//div/img[@class="prod-imglist"]/@src')

yield l.load_item()

希望这有帮助

关于python - Scrapy 分页在多个列表上失败,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52383001/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com