gpt4 book ai didi

python - 如何将两个用户定义的参数传递给 scrapy 蜘蛛

转载 作者:行者123 更新时间:2023-11-28 22:35:13 26 4
gpt4 key购买 nike

正在关注 How to pass a user defined argument in scrapy spider ,我写了下面这个简单的蜘蛛:

import scrapy

class Funda1Spider(scrapy.Spider):
name = "funda1"
allowed_domains = ["funda.nl"]

def __init__(self, place='amsterdam'):
self.start_urls = ["http://www.funda.nl/koop/%s/" % place]

def parse(self, response):
filename = response.url.split("/")[-2] + '.html'
with open(filename, 'wb') as f:
f.write(response.body)

这似乎可行;例如,如果我使用

从命令行运行它
scrapy crawl funda1 -a place=rotterdam

它生成一个 rotterdam.html 看起来类似于 http://www.funda.nl/koop/rotterdam/ .接下来我想扩展它以便可以指定一个子页面,例如 http://www.funda.nl/koop/rotterdam/p2/ .我尝试了以下方法:

import scrapy

class Funda1Spider(scrapy.Spider):
name = "funda1"
allowed_domains = ["funda.nl"]

def __init__(self, place='amsterdam', page=''):
self.start_urls = ["http://www.funda.nl/koop/%s/p%s/" % (place, page)]

def parse(self, response):
filename = response.url.split("/")[-2] + '.html'
with open(filename, 'wb') as f:
f.write(response.body)

但是,如果我尝试用

运行它
scrapy crawl funda1 -a place=rotterdam page=2

我收到以下错误:

crawl: error: running 'scrapy crawl' with more than one spider is no longer supported

我不太理解这个错误消息,因为我不是要抓取两个蜘蛛,而只是想传递两个关键字参数来修改 start_urls。我怎样才能做到这一点?

最佳答案

当提供多个参数时,您需要为每个 参数加上前缀-a

您的案例的正确行是:

scrapy crawl funda1 -a place=rotterdam -a page=2

关于python - 如何将两个用户定义的参数传递给 scrapy 蜘蛛,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38421954/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com