gpt4 book ai didi

python - 按顺序运行多个蜘蛛

转载 作者:太空宇宙 更新时间:2023-11-03 11:24:51 25 4
gpt4 key购买 nike

Class Myspider1
#do something....

Class Myspider2
#do something...

以上是我的spider.py文件的架构。我尝试先运行 Myspider1,然后根据某些条件多次运行 Myspider2。我怎么能那样做???有小费吗?

configure_logging()
runner = CrawlerRunner()
def crawl():
yield runner.crawl(Myspider1,arg.....)
yield runner.crawl(Myspider2,arg.....)
crawl()
reactor.run()

我正在尝试使用这种方式。但不知道如何运行它。我应该在 cmd 上运行 cmd(什么命令?)还是只运行 python 文件??

非常感谢!!!

最佳答案

运行python文件
例如:测试.py

import scrapy
from twisted.internet import reactor, defer
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging

class MySpider1(scrapy.Spider):
# Your first spider definition
name = "dmoz"
allowed_domains = ["dmoz.org"]
start_urls = [
"http://www.dmoz.org/Computers/Programming/Languages/Python/Books/"
]

def parse(self, response):
print "first spider"

class MySpider2(scrapy.Spider):
# Your second spider definition
name = "dmoz"
allowed_domains = ["dmoz.org"]
start_urls = [
"http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
]

def parse(self, response):
print "second spider"

configure_logging()
runner = CrawlerRunner()

@defer.inlineCallbacks
def crawl():
yield runner.crawl(MySpider1)
yield runner.crawl(MySpider2)
reactor.stop()

crawl()
reactor.run() # the script will block here until the last crawl call is finished

现在运行 python test.py > output.txt
您可以从 output.txt 中观察到您的爬虫是按顺序运行的。

关于python - 按顺序运行多个蜘蛛,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36109400/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com