gpt4 book ai didi

python - 当两个蜘蛛都完成时如何停止 react 器

转载 作者:太空宇宙 更新时间:2023-11-03 18:13:21 30 4
gpt4 key购买 nike

我有这段代码,当两个蜘蛛完成后,程序仍在运行。

#!C:\Python27\python.exe

from twisted.internet import reactor
from scrapy.crawler import Crawler
from scrapy import log, signals
from carrefour.spiders.tesco import TescoSpider
from carrefour.spiders.carr import CarrSpider
from scrapy.utils.project import get_project_settings
import threading
import time

def tescofcn():
tescoSpider = TescoSpider()
settings = get_project_settings()
crawler = Crawler(settings)
crawler.configure()
crawler.crawl(tescoSpider)
crawler.start()

def carrfcn():
carrSpider = CarrSpider()
settings = get_project_settings()
crawler = Crawler(settings)
crawler.configure()
crawler.crawl(carrSpider)
crawler.start()


t1=threading.Thread(target=tescofcn)
t2=threading.Thread(target=carrfcn)

t1.start()
t2.start()
log.start()
reactor.run()

当我尝试将其插入两个函数时

crawler.signals.connect(reactor.stop, signal=signals.spider_closed)

,对于两个蜘蛛和较慢的蜘蛛来说,速度较快的末端 react 器都被终止了,尽管他还没有完成。

最佳答案

您可以做的是创建一个函数来检查蜘蛛的运行列表并将其连接到singals.spider_close

from scrapy.utils.trackref import iter_all


def close_reactor_if_no_spiders():
running_spiders = [spider for spider in iter_all('Spider')]

if not running_spiders:
reactor.stop()

crawler.signals.connect(close_reactor_if_no_spiders, signal=signals.spider_closed)

尽管如此,我仍然建议使用 scrapyd 来管理运行多个蜘蛛。

关于python - 当两个蜘蛛都完成时如何停止 react 器,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25480298/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com