gpt4 book ai didi

python - 能够在从脚本运行 scrapy 时更改设置

转载 作者:太空狗 更新时间:2023-10-30 02:43:32 25 4
gpt4 key购买 nike

我要run scrapy from a single script我想从 settings.py 中获取所有设置,但我希望能够更改其中的一些设置:

from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings

process = CrawlerProcess(get_project_settings())

*### so what im missing here is being able to set or override one or two of the settings###*


# 'followall' is the name of one of the spiders of the project.
process.crawl('testspider', domain='scrapinghub.com')
process.start() # the script will block here until the crawling is finished

我无法使用 this .我尝试了以下方法:

settings=scrapy.settings.Settings()
settings.set('RETRY_TIMES',10)

但是没用。

注意:我使用的是最新版本的 scrapy。

最佳答案

因此,为了覆盖某些设置,一种方法是在我们的脚本中覆盖/设置蜘蛛的静态变量 custom_settings。

所以我导入了蜘蛛的类,然后覆盖了 custom_setting:

from testspiders.spiders.followall import FollowAllSpider 

FollowAllSpider.custom_settings={'RETRY_TIMES':10}

所以这是整个脚本:

from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from testspiders.spiders.followall import FollowAllSpider

FollowAllSpider.custom_settings={'RETRY_TIMES':10}
process = CrawlerProcess(get_project_settings())


# 'followall' is the name of one of the spiders of the project.
process.crawl('testspider', domain='scrapinghub.com')
process.start() # the script will block here until the crawling is finished

关于python - 能够在从脚本运行 scrapy 时更改设置,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33094306/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com