gpt4 book ai didi

python - 将 xPath 作为参数传递给 Scrapy

转载 作者:太空宇宙 更新时间:2023-11-03 16:18:10 28 4
gpt4 key购买 nike

我正在尝试为单个网页编写一个通用爬虫,该爬虫使用以下参数进行调用:

  • 允许的域
  • 要抓取的URL
  • 用于提取网页内价格的 xPath

URL 和允许的域参数似乎工作正常,但我无法使 xPath 参数工作。

我猜我需要声明一个变量来保持它的正确性,因为其他两个参数被分配给现有的类元素。

这是我的蜘蛛:

import scrapy
from Spotlite.items import SpotliteItem

class GenericSpider(scrapy.Spider):
name = "generic"

def __init__(self, start_url=None, allowed_domains=None, xpath_string=None, *args, **kwargs):
super(GenericSpider, self).__init__(*args, **kwargs)
self.start_urls = ['%s' % start_url]
self.allowed_domains = ['%s' % allowed_domains]
xpath_string = ['%s' % xpath_string]

def parse(self, response):
self.logger.info('Hi, this is an item page! %s', response.url)
item = SpotliteItem()
item['url'] = response.url
item['price'] = response.xpath(xpath_string).extract()
return item

我收到以下错误:

Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 577, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/home/ubuntu/spotlite/spotlite/spiders/generic.py", line 23, in parse
item['price'] = response.xpath(xpath_string).extract()

NameError:全局名称“xpath_string”未定义

任何帮助将不胜感激!

谢谢

迈克尔

最佳答案

xpath_string 作为实例变量:

import scrapy
from Spotlite.items import SpotliteItem

class GenericSpider(scrapy.Spider):
name = "generic"

def __init__(self, start_url=None, allowed_domains=None, xpath_string=None, *args, **kwargs):
super(GenericSpider, self).__init__(*args, **kwargs)
self.start_urls = ['%s' % start_url]
self.allowed_domains = ['%s' % allowed_domains]
self.xpath_string = xpath_string

def parse(self, response):
self.logger.info('Hi, this is an item page! %s', response.url)
item = SpotliteItem()
item['url'] = response.url
item['price'] = response.xpath(self.xpath_string).extract()
return item

关于python - 将 xPath 作为参数传递给 Scrapy,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38728973/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com