gpt4 book ai didi

python - 限制/限制 GRequests 中 HTTP 请求的速率

转载 作者:IT老高 更新时间:2023-10-28 20:26:42 28 4
gpt4 key购买 nike

我正在用 Python 2.7.3 编写一个带有 GRequests 的小脚本和 lxml 可以让我从各种网站收集一些收藏卡价格并进行比较。问题是其中一个网站限制了请求的数量,如果我超过它,就会发回 HTTP 错误 429。

有没有办法限制 GRequestes 中的请求数,这样我就不会超过我指定的每秒请求数?另外 - 如果发生 HTTP 429,我如何让 GRequestes 在一段时间后重试?

附带说明 - 他们的限制低得离谱。每 15 秒有 8 个请求。我多次使用浏览器破坏了它,只是刷新了等待价格变化的页面。

最佳答案

要回答我自己的问题,因为我必须自己解决这个问题,而且关于这个问题的信息似乎很少。

思路如下。与 GRequests 一起使用的每个请求对象在创建时都可以将 session 对象作为参数。另一方面, session 对象可以安装在发出请求时使用的 HTTP 适配器。通过创建我们自己的适配器,我们可以拦截请求并以我们认为最适合我们的应用程序的方式限制它们。就我而言,我最终得到了下面的代码。

用于 throttle 的对象:

DEFAULT_BURST_WINDOW = datetime.timedelta(seconds=5)
DEFAULT_WAIT_WINDOW = datetime.timedelta(seconds=15)


class BurstThrottle(object):
max_hits = None
hits = None
burst_window = None
total_window = None
timestamp = None

def __init__(self, max_hits, burst_window, wait_window):
self.max_hits = max_hits
self.hits = 0
self.burst_window = burst_window
self.total_window = burst_window + wait_window
self.timestamp = datetime.datetime.min

def throttle(self):
now = datetime.datetime.utcnow()
if now < self.timestamp + self.total_window:
if (now < self.timestamp + self.burst_window) and (self.hits < self.max_hits):
self.hits += 1
return datetime.timedelta(0)
else:
return self.timestamp + self.total_window - now
else:
self.timestamp = now
self.hits = 1
return datetime.timedelta(0)

HTTP 适配器:

class MyHttpAdapter(requests.adapters.HTTPAdapter):
throttle = None

def __init__(self, pool_connections=requests.adapters.DEFAULT_POOLSIZE,
pool_maxsize=requests.adapters.DEFAULT_POOLSIZE, max_retries=requests.adapters.DEFAULT_RETRIES,
pool_block=requests.adapters.DEFAULT_POOLBLOCK, burst_window=DEFAULT_BURST_WINDOW,
wait_window=DEFAULT_WAIT_WINDOW):
self.throttle = BurstThrottle(pool_maxsize, burst_window, wait_window)
super(MyHttpAdapter, self).__init__(pool_connections=pool_connections, pool_maxsize=pool_maxsize,
max_retries=max_retries, pool_block=pool_block)

def send(self, request, stream=False, timeout=None, verify=True, cert=None, proxies=None):
request_successful = False
response = None
while not request_successful:
wait_time = self.throttle.throttle()
while wait_time > datetime.timedelta(0):
gevent.sleep(wait_time.total_seconds(), ref=True)
wait_time = self.throttle.throttle()

response = super(MyHttpAdapter, self).send(request, stream=stream, timeout=timeout,
verify=verify, cert=cert, proxies=proxies)

if response.status_code != 429:
request_successful = True

return response

设置:

requests_adapter = adapter.MyHttpAdapter(
pool_connections=__CONCURRENT_LIMIT__,
pool_maxsize=__CONCURRENT_LIMIT__,
max_retries=0,
pool_block=False,
burst_window=datetime.timedelta(seconds=5),
wait_window=datetime.timedelta(seconds=20))

requests_session = requests.session()
requests_session.mount('http://', requests_adapter)
requests_session.mount('https://', requests_adapter)

unsent_requests = (grequests.get(url,
hooks={'response': handle_response},
session=requests_session) for url in urls)
grequests.map(unsent_requests, size=__CONCURRENT_LIMIT__)

关于python - 限制/限制 GRequests 中 HTTP 请求的速率,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20247354/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com