python - scrapy while 循环中出现 ReactorNotRestartable 错误-6ren

python - scrapy while 循环中出现 ReactorNotRestartable 错误

转载作者：太空宇宙更新时间：2023-11-03 20:24:00

29

4

执行以下代码时出现 twisted.internet.error.ReactorNotRestartable 错误:

from time import sleep
from scrapy import signals
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from scrapy.xlib.pydispatch import dispatcher

result = None

def set_result(item):
    result = item

while True:
    process = CrawlerProcess(get_project_settings())
    dispatcher.connect(set_result, signals.item_scraped)

    process.crawl('my_spider')
    process.start()

    if result:
        break
    sleep(3)

第一次运行时，我收到错误。我每次都会创建 process 变量，那么问题出在哪里？

最佳答案

默认情况下，CrawlerProcess的.start()当所有爬虫完成后，将停止它创建的 Twisted react 器。

如果您在每次迭代中创建 process，则应该调用 process.start(stop_after_crawl=False)。

另一种选择是自己处理 Twisted react 堆并使用 CrawlerRunner 。 The docs have an example这样做。

关于python - scrapy while 循环中出现 ReactorNotRestartable 错误，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57962863/

29

4

0

文章推荐： asp.net - CSS 如何与 ASP.NET Web 控件一起使用？

文章推荐： matlab - 生成随机点 - 限制总面积中每 block 瓷砖的数量

文章推荐： c# - 更改 "for"中的属性 "Html.LabelFor"

文章推荐： css - 为什么我的布局/CSS 搞砸了？

Python scrapy ReactorNotRestartable 替代品
我一直在尝试使用具有以下功能的 Scrapy 在 Python 中制作一个应用程序: rest api(我是用 flask 做的)监听所有爬取/抓取请求并在爬取后返回响应。(爬取部分足够短, 这样
python - scrapy while 循环中出现 ReactorNotRestartable 错误
执行以下代码时出现 twisted.internet.error.ReactorNotRestartable 错误: from time import sleep from scrapy import
python - 当 CrawlerProcess 运行两次时，Scrapy 引发 ReactorNotRestartable
我有一些看起来像这样的代码: def run(spider_name, settings): runner = CrawlerProcess(settings) runner.craw
Scrapy `ReactorNotRestartable` : one class to run two (or more) spiders
我正在使用两阶段爬网使用 Scrapy 汇总每日数据。第一阶段从索引页面生成 URL 列表，第二阶段将列表中的每个 URL 的 HTML 写入 Kafka 主题。尽管爬网的两个组件是相关的，但我希望
python - ReactorNotRestartable 在使用 twisted 和 trial 启动两个等效的单元测试时
我有两个测试类(TrialTest1 和 TrialTest2)写在两个文件中(test_trial1.py 和 test_trial2.py ) 大部分相同(唯一的区别是类名): from twis
python - 带有 scrapy 的 while 循环中的 ReactorNotRestartable 错误
当我执行以下代码时，出现 twisted.internet.error.ReactorNotRestartable 错误: from time import sleep from scrapy imp
python - 第一次运行后出现 Scrapy 'twisted.internet.error.ReactorNotRestartable' 错误
我正在使用 CrawlerProcess 从脚本运行 Scrapy (版本 1.4.0)。网址来自用户输入。第一次运行良好，但第二次出现 twisted.internet.error.ReactorN
python - Scrapy:无法在 Jupyter Notebook 脚本中重新运行，报告 ReactorNotRestartable
我的scrapy代码是这样的: import scrapy from scrapy.crawler import CrawlerProcess class MovieSpider(scrapy.Spi
amazon-web-services - Scrapy 在 AWS Lambda 上运行时抛出错误 ReactorNotRestartable
我已经部署了一个scrapy项目，只要有 lambda api 请求，它就会爬行。它在第一次 api 调用时运行良好，但后来失败并抛出 ReactorNotRestartable 错误。据我所知，
python - 尝试在 AWS Lambda 上测试 Scrapy Web-Crawler 时出现此错误 "raise error.reactornotrestartable() "
我将网络爬虫部署到 AWS Lambda。然后在测试的时候，第一次运行正确，但是第二次就报这个错。在 AWS lambda 中引发 error.reactornotrestartable() twis
python - Django + Celery + Scrapy 扭曲 react 器(ReactorNotRestartable)和数据库(SSL 错误)错误
我有一个 Django 2.0、Celery 4 和 Scrapy 1.5 设置，我在 Django 自定义命令中有一个 Spider，我需要定期调用这个命令，我使用 Celery 调用这些命令，它们

首页

博学

6Ren·AI

商城

python - scrapy while 循环中出现 ReactorNotRestartable 错误