gpt4 book ai didi

python - 横向扩展 Scrapyd

转载 作者:太空狗 更新时间:2023-10-30 01:33:34 26 4
gpt4 key购买 nike

您将使用什么工具或一组工具来水平扩展 scrapyd,动态地将新机器添加到 scrapyd 集群,并在需要时每台机器拥有 N 个实例。并非所有实例都必须共享一个公共(public)作业队列,但这会很棒。

Scrapy-cluster看起来对这份工作很有希望,但我想要一个基于 Scrapyd 的解决方案,所以我听取了其他选择和建议。

最佳答案

我使用 Scrapyd 的 API 和 wrapper 为 Scrapyd 编写了自己的负载均衡器脚本。 .

from random import shuffle
from scrapyd_api.wrapper import ScrapydAPI

class JobLoadBalancer(object):

@classmethod
def get_less_occupied(
cls,
servers_urls=settings.SERVERS_URLS,
project=settings.DEFAULT_PROJECT,
acceptable=settings.ACCEPTABLE_PENDING):

free_runner = {'num_jobs': 9999, 'client': None}
# shuffle servers optimization
shuffle(servers_urls)
for url in servers_urls:
scrapyd = ScrapydAPI(target=url)
jobs = scrapyd.list_jobs(project)
num_jobs = len(jobs['pending'])

if free_runner['num_jobs'] > num_jobs:
free_runner['num_jobs'] = num_jobs
free_runner['client'] = scrapyd
# Optimization: if found acceptable pending operations in one server stop looking for another one
if free_runner['client'] and free_runner['num_jobs'] <= acceptable:
break

return free_runner['client']

单元测试:

def setUp(self):
super(TestFactory, self).setUp()
# Make sure this servers are running
settings.SERVERS_URLS = [
'http://localhost:6800',
'http://localhost:6900'
]
self.project = 'dummy'
self.spider = 'dummy_spider'
self.acceptable = 0

def test_get_less_occupied(self):
# add new dummy jobs to first server so that choose the second one
scrapyd = ScrapydAPI(target=settings.SERVERS_URLS[0])
scrapyd.schedule(project=self.project, spider=self.spider)
scrapyd.schedule(project=self.project, spider=self.spider)
second_server_url = settings.SERVERS_URLS[1]
scrapyd = JobLoadBalancer.get_less_occupied(
servers_urls=settings.SERVERS_URLS,
project=self.project,
acceptable=self.acceptable)
self.assertEqual(scrapyd.target, second_server_url)

此代码针对旧版本的 scrapyd,因为它是一年多前编写的。

关于python - 横向扩展 Scrapyd,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31617562/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com