我想知道提高 CloseSpider 有什么影响。在文档中 http://doc.scrapy.org/en/latest/topics/exceptions.html#closespider没有关于它的信息。如您所知,scrapy 同时处理几个请求。如果在处理最后一个请求之前引发此异常怎么办?它会等待处理之前生成的休息请求吗?示例:
def parse(self, response):
my_url = 'http://someurl.com/item/'
for i in range(1, 100):
my_url += str(i)
if i == 50:
raise CloseSpider('')
else:
yield Request(url=my_url, callback=self.my_handler)
def my_handler(self, response):
# handler
感谢您的回复。
========================可能的解决方案:
is_alive = True
def parse(self, response):
my_url = 'http://url.com/item/'
for i in range(1, 100):
if not is_alive:
break
my_url += str(i)
yield Request(url=my_url, callback=self.my_handler)
def my_handler(self, response):
if (response do not contains new item):
is_alive = False
根据source code , 如果有一个 CloseSpider
异常被抛出,engine.close_spider()
方法将被执行:
def handle_spider_error(self, _failure, request, response, spider):
exc = _failure.value
if isinstance(exc, CloseSpider):
self.crawler.engine.close_spider(spider, exc.reason or 'cancelled')
return
engine.close_spider()
本身会关闭蜘蛛并清除所有未完成的请求:
def close_spider(self, spider, reason='cancelled'):
"""Close (cancel) spider and clear all its outstanding requests"""
slot = self.slot
if slot.closing:
return slot.closing
logger.info("Closing spider (%(reason)s)",
{'reason': reason},
extra={'spider': spider})
dfd = slot.close()
# ...
它还会为 Scrapy 架构的不同组件安排 close_spider()
调用:下载器、抓取器、调度器等。
我是一名优秀的程序员,十分优秀!