gpt4 book ai didi

python - 蜘蛛停止爬行或遇到异常后如何退出Scrapy Python脚本?

转载 作者:太空宇宙 更新时间:2023-11-04 06:41:26 28 4
gpt4 key购买 nike

我试图每分钟从 Windows 的任务计划程序中的 bat 文件运行我的 Scrapy 的 python 脚本。

但是 python 脚本不知何故没有退出,它阻止了所有 future 从任务计划程序启动的任务。

所以,我的问题是,

  1. 如何在爬虫完成运行后优雅地退出我的Scrapy脚本?

  2. Scrapy脚本遇到异常,尤其是ReactorNotRunning Error,如何退出?

提前谢谢大家。

这是我运行python脚本的bat文件

@echo off
python "C:\Scripts\start.py"
pause

这是我的python脚本

from cineplex.spiders import seatings_spider as seat
import scrapy
from scrapy.crawler import CrawlerProcess
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
from scrapy.utils.project import get_project_settings
import sys
import time
from twisted.internet import reactor, defer


def crawl_all_showtimes():
# Create a CrawlerRunner instance to manage multiple spider simultaneously
runner = CrawlerRunner()

# Check folder for today
directory_for_today = utils.create_dir_for_today(PARENT_DIR)

# Get all cinema id and names first
cinema_dict = utils.get_all_cinemas()

# Prepare for crawling
crawl_showtimes_helper(directory_for_today, cinema_dict, runner)

# Start Crawling for Showtimes
reactor.run()


# Helps to run multiple ShowTimesSpiders sequentially
@defer.inlineCallbacks
def crawl_showtimes_helper(output_dir, cinema_dict, runner):
# Iterate through all cinema to get show timings
for cinema_id, cinema_name in cinema_dict.iteritems():
yield runner.crawl(st.ShowTimesSpider, cinema_id=cinema_id, cinema_name=cinema_name, output_dir=output_dir )
reactor.stop()

if __name__ == "__main__":

# Turns on Scrapy Logging
configure_logging()

# Collect all Seatings
crawl_all_seatings()

最佳答案

程序的主线程为一些Scrapy线程阻塞。所以在你的主程序中使用这个:

import sys;
sys.exit()

关于python - 蜘蛛停止爬行或遇到异常后如何退出Scrapy Python脚本?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43475731/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com