gpt4 book ai didi

python - 为特定的scrapy请求添加延迟

转载 作者:行者123 更新时间:2023-12-01 06:32:11 26 4
gpt4 key购买 nike

是否可以延迟特定scrapy请求的重试。我有一个中间件,它需要将页面请求推迟到稍后的时间。我知道如何进行基本的延迟(队列结束),以及如何延迟所有请求(全局设置),但我只想延迟这个单独的请求。这在队列末尾附近最重要,如果我进行简单的延迟,它会立即再次成为下一个请求。

最佳答案

方法一
一种方法是为您的 Spider 添加一个中间件( sourcelinked ):

# File: middlewares.py

from twisted.internet import reactor
from twisted.internet.defer import Deferred


class DelayedRequestsMiddleware(object):
def process_request(self, request, spider):
delay_s = request.meta.get('delay_request_by', None)
if not delay_s:
return

deferred = Deferred()
reactor.callLater(delay_s, deferred.callback, None)
return deferred
你以后可以像这样在你的蜘蛛中使用它:
import scrapy


class QuotesSpider(scrapy.Spider):
name = "quotes"
custom_settings = {
'DOWNLOADER_MIDDLEWARES': {'middlewares.DelayedRequestsMiddleware': 123},
}

def start_requests(self):
# This request will have itself delayed by 5 seconds
yield scrapy.Request(url='http://quotes.toscrape.com/page/1/',
meta={'delay_request_by': 5})
# This request will not be delayed
yield scrapy.Request(url='http://quotes.toscrape.com/page/2/')

def parse(self, response):
... # Process results here
方法二
您可以使用自定义重试中间件 ( source ) 执行此操作,您只需要覆盖 process_response当前的方法 Retry Middleware :
from scrapy.downloadermiddlewares.retry import RetryMiddleware
from scrapy.utils.response import response_status_message


class CustomRetryMiddleware(RetryMiddleware):

def process_response(self, request, response, spider):
if request.meta.get('dont_retry', False):
return response
if response.status in self.retry_http_codes:
reason = response_status_message(response.status)

# Your delay code here, for example sleep(10) or polling server until it is alive

return self._retry(request, reason, spider) or response

return response
然后启用它而不是默认的 RetryMiddlewaresettings.py :
DOWNLOADER_MIDDLEWARES = {
'scrapy.downloadermiddlewares.retry.RetryMiddleware': None,
'myproject.middlewarefilepath.CustomRetryMiddleware': 550,
}

关于python - 为特定的scrapy请求添加延迟,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19135875/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com