python - Python asyncio:按顺序完成-6ren

python - Python asyncio:按顺序完成

转载作者：行者123 更新时间：2023-12-01 11:27:56

TL; DR

有没有办法等待多个期货，并以给定的顺序完成，从而从中获利？

很长的故事

假设您有两个数据源。一个给您id -> name映射，另一个给您id -> age映射。您要计算(name, age) -> number_of_ids_with_that_name_and_age。

有太多数据无法直接加载，但是两个数据源都支持通过id 进行分页/迭代和排序。

所以你写类似

def iterate_names():
 for page in get_name_page_numbers():
 yield from iterate_name_page(page) # yields (id, name) pairs

和年龄相同，然后遍历iterate_names()和iterate_ages()。

怎么了发生的是:
您请求一页名称和年龄
你得到他们
您将处理数据，直到到达页面末尾为止，例如
您要求另一个年龄的页面
您将处理数据，直到...

基本上，您在处理数据时不会等待任何请求。

您可以使用asyncio.gather发送所有请求并等待所有数据，但是然后:
当第一页到达时，您仍在等待其他人
内存不足

有asyncio.as_completed，它允许您在获取结果时发送所有请求并处理页面，但是您将使页面混乱，因此您将无法进行处理。

理想情况下，将有一个函数会发出第一个请求，并且随着响应的到来，发出第二个请求并同时产生第一个请求的结果。

那可能吗？

最佳答案

您的问题中发生了很多事情。我会尝试所有这些。

有没有办法等待多个期货，并以给定的顺序完成，从而从中获利？

是。您的代码可以按顺序yield from或await任何数量的期货。如果您专门谈论Task，并且希望这些任务同时执行，则只需将它们分配给循环(在asyncio.ensure_future()或loop.create_task()时完成)，然后循环就需要运行。

至于按顺序产生它们，您可以在创建任务时首先确定该顺序。在一个简单的示例中，在您开始处理所有任务/功能之前，已经创建了所有任务/功能，可以使用list存储任务的将来，并最终从列表中拉出:

loop = asyncio.get_event_loop() tasks_im_waiting_for = [] for thing in things_to_get: task = loop.create_task(get_a_thing_coroutine(thing)) tasks_im_waiting_for.append(task) @asyncio.coroutine def process_gotten_things(getter_tasks): for task in getter_tasks: result = yield from task print("We got {}".format(result)) loop.run_until_complete(process_gotten_things(tasks_im_waiting_for))

该示例一次只能处理一个结果，但在等待序列中的下一个吸毒者任务完成时，仍将允许任何计划的吸气剂任务继续执行其工作。如果处理顺序无关紧要，并且我们想一次处理多个可能准备就绪的结果，那么我们可以使用 deque而不是 list，其中不止一个 process_gotten_things任务 .pop()从 deque。如果我们想变得更高级，可以执行 as Vincent suggests in a comment to your question并使用 asyncio.Queue代替 deque。使用这样的队列，您可以让生产者将任务添加到与任务处理使用者同时运行的队列中。

但是，使用 deque或 Queue排序期货以进行处理有一个缺点，那就是，与运行处理器任务时一样，您只能同时处理多个期货。您可以在每次排队要处理的新将来时创建一个新的处理器任务，但是此时，此队列成为完全冗余的数据结构，因为 asyncio已经为您提供了一个类似于队列的对象，其中添加的所有内容均得到处理并发:事件循环。 对于我们计划的每个任务，我们还可以计划其处理。修改以上示例:

for thing in things_to_get: getter_task = loop.create_task(get_a_thing_coroutine(thing)) processor_task = loop.create_task(process_gotten_thing(getter_task)) # Tasks are futures; the processor can await the result once started

现在让我们说，我们的getter可能会返回多个事物(类似于您的方案)，并且每个事物都需要进行一些处理。这使我进入了另一种异步设计模式:子任务。您的任务可以安排事件循环上的其他任务。当事件循环运行时，您的第一个任务的顺序将保持不变，但是如果其中一个任务最终等待某件事，那么您的一个子任务就有可能在其中开始执行。修改上述方案后，我们可以将循环传递给协程，以便协程可以安排处理其结果的任务:

for thing in things_to_get: task = loop.create_task(get_a_thing_coroutine(thing, loop)) @asyncio.coroutine def get_a_thing_coroutine(thing, loop): results = yield from long_time_database_call(thing) subtasks = [] for result in results: subtasks.append(loop.create_task(process_result(result))) # With subtasks scheduled in the order we like, wait for them # to finish before we consider THIS task complete. yield from asyncio.wait(subtasks)

所有这些高级模式都按照您想要的顺序启动任务，但可能会以任何顺序完成处理任务。如果您确实需要以与开始获取结果完全相同的顺序处理结果，则请坚持使用单个处理器从序列中提取结果期货或从 asyncio.Queue中提取收益。

您还将注意到，为了确保任务以可预测的顺序启动，我使用 loop.create_task()明确地安排了任务。尽管 asyncio.gather()和 asyncio.wait()会很高兴地将协程对象并将它们作为 Task进行调度/包装，但是在撰写本文时，它们在以可预测的顺序进行调度方面存在问题。 See asyncio issue #432。

好的，让我们回到您的具体情况。您有两个单独的结果源，这些结果需要通过一个公用键 id结合在一起。我提到的获取和处理这些东西的模式并不能解决这样的问题，我也不知道完美的模式是什么。我将尽我所能尝试这种方法。

我们需要一些对象来维护我们所知道的和到目前为止所做的事情的状态，以便随着知识的增长将其关联起来。
# defaultdicts are great for representing knowledge that an interested # party might want whether or not we have any knowledge to begin with: from collections import defaultdict # Let's start with a place to store our end goal: name_and_age_to_id_count = defaultdict(int) # Given we're correlating info from two sources, let's make two places to # store that info, keyed by what we're joining on: id # When we join correlate this info, only one side might be known, so use a # Future on both sides to represent data we may or may not have yet. id_to_age_future = defaultdict(loop.create_future) id_to_name_future = defaultdict(loop.create_future) # As soon as we learn the name or age for an id, we can begin processing # the joint information, but because this information is coming from # multiple sources we want to process concurrently we need to keep track # of what ids we've started processing the joint info for. ids_scheduled_for_processing = set()

我们知道我们将通过您提到的迭代器在“页面”中获取此信息，因此让我们从这里开始设计任务:
@asyncio.coroutine def process_name_page(page_number): subtasks = [] for id, name in iterate_name_page(page_number): name_future = id_to_name_future[id] name_future.set_result(name) if id not in ids_scheduled_for_processing: age_future = id_to_age_future[id] task = loop.create_task(increment_name_age_pair(id, name_future, age_future)) subtasks.append(task) ids_scheduled_for_processing.add(id) yield from asyncio.wait(subtasks) @asyncio.coroutine def process_age_page(page_number): subtasks = [] for id, age in iterate_age_page(page_number): age_future = id_to_age_future[id] age_future.set_result(age) if id not in ids_scheduled_for_processing: name_future = id_to_name_future[id] task = loop.create_task(increment_name_age_pair(id, name_future, age_future)) subtasks.append(task) ids_scheduled_for_processing.add(id) yield from asyncio.wait(subtasks)

这些协程计划要处理的ID的名称/年龄对-更具体地说，ID的名称和年龄的 future 。一旦开始，处理器将等待两个期货的结果(某种意义上将它们加入)。
@asyncio.coroutine def increment_name_age_pair(id, name_future, age_future): # This will wait until both futures are resolved and let other tasks work in the meantime: pair = ((yield from name_future), (yield from age_future)) name_and_age_to_id_count[pair] += 1 # If memory is a concern: ids_scheduled_for_processing.discard(id) del id_to_age_future[id] del id_to_name_future[id]

好，我们有获取/迭代页面的任务以及处理这些页面中内容的子任务。现在，我们需要实际安排获取这些页面的时间。回到您的问题，我们有两个要提取的数据源，我们希望从它们并行提取。我们假定来自一个的信息的顺序与来自另一个的信息的顺序紧密相关，因此我们在事件循环中对两者的处理进行交织。
page_processing_tasks = [] # Interleave name and age pages: for name_page_number, age_page_number in zip_longest( get_name_page_numbers(), get_age_page_numbers() ): # Explicitly schedule it as a task in the order we want because gather # and wait have non-deterministic scheduling order: if name_page_number is not None: page_processing_tasks.append(loop.create_task(process_name_page(name_page_number))) if age_page_number is not None: page_processing_tasks.append(loop.create_task(process_age_page(age_page_number)))

现在我们已经安排了顶层任务，我们最终可以实际执行以下操作:
loop.run_until_complete(asyncio.wait(page_processing_tasks)) print(name_and_age_to_id_count)

asyncio可能无法解决您的所有并行处理难题。您提到了“处理”要迭代的每个页面要花很长时间。如果由于要等待服务器的响应而花了很多时间，那么此架构是一种精巧的轻量级方法，可满足您的需求(只需确保使用异步循环感知工具来完成I / O操作)。

如果由于Python处理数字或用CPU和内存移动东西而需要永远，那么asyncio的单线程事件循环对您没有多大帮助，因为一次仅发生一次Python操作。在这种情况下，如果您想坚持使用asyncio和子任务模式，则可能要考虑将 loop.run_in_executor 与Python解释器进程池一起使用。您还可以使用带有进程池的 concurrent.futures library而不是使用asyncio来开发解决方案。

注意:您提供的示例生成器可能会使某些人感到困惑，因为它使用 yield from将生成委派给内部生成器。碰巧的是，异步协程使用相同的表达式等待将来的结果，并告诉循环它可以在需要时运行其他协程的代码。

关于python - Python asyncio:按顺序完成，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/35699601/

文章推荐： freemarker - 如何在 freemarker ( .ftl ) 中转储对象

python-asyncio - Python asyncio - 增加Semaphore的值
我正在我的一个项目中使用 aiohttp 并想限制每秒发出的请求数。我正在使用 asyncio.Semaphore 来做到这一点。我的挑战是我可能想要增加/减少每秒允许的请求数。例如: limit
python-asyncio - 在 asyncio 中混合异步上下文管理器和直接等待
如何混合 async with api.open() as o: ... 和 o = await api.open() 在一个功能中？自从第一次需要带有 __aenter__ 的对象以来和
python-asyncio - 使用 asyncio 做多项终极工作
有 2 个工作:“wash_clothes”(job1) 和“setup_cleaning_robot”(job2)，每个工作需要你 7 和 3 秒，你必须做到世界末日。这是我的代码: import
python-asyncio - 如何为 asyncio 任务设置名称？
我们有一种设置线程名称的方法:thread = threading.Thread(name='Very important thread', target=foo)，然后在格式化程序中使用 %(thr
python - 使用 asyncio 生成器和 asyncio.as_completed
我有一些代码，用于抓取 URL、解析信息，然后使用 SQLAlchemy 将其放入数据库中。我尝试异步执行此操作，同时限制同时请求的最大数量。这是我的代码: async def get_url(ai
Python Asyncio 未使用 asyncio.run_coroutine_threadsafe 运行新协程
1>Python Asyncio 未使用 asyncio.run_coroutine_threadsafe 运行新的协程。下面是在Mac上进行的代码测试。 ——————————————————————
python - Asyncio.gather 与 asyncio.wait
asyncio.gather和 asyncio.wait似乎有类似的用途:我有一堆我想要执行/等待的异步事情(不一定要在下一个开始之前等待一个完成)。它们使用不同的语法，并且在某些细节上有所不同，但对
python-asyncio - 属性错误 : module 'asyncio' has no attribute 'run'
我正在尝试使用 asyncio 运行以下程序: import asyncio async def main(): print('Hello') await asyncio.sleep(
python-asyncio - 如何使用 asyncio 接口(interface)阻塞和非阻塞代码
我正在尝试在事件循环之外使用协程函数。 (在这种情况下，我想在 Django 中调用一个也可以在事件循环中使用的函数) 如果不使调用函数成为协程，似乎没有办法做到这一点。我意识到 Django 是为
python - 在 asyncio.gather 中内联链 asyncio 协程
我有一个假设 asyncio.gather设想: await asyncio.gather( cor1, [cor2, cor3], cor4, ) 我要 cor2和 cor3
Python3 和 asyncio : how to implement websocket server as asyncio instance?
我有多个服务器，每个服务器都是 asyncio.start_server 返回的实例。我需要我的 web_server 与 websockets 一起使用，以便能够使用我的 javascript 客户
Python 3 asyncio - yield from vs asyncio.async 堆栈使用
我正在使用 Python 3 asyncio 框架评估定期执行的不同模式(为简洁起见省略了实际 sleep /延迟)，我有两段代码表现不同，我无法解释原因。第一个版本使用 yield from 递归调
loop.create_task 和 asyncio.run_coroutine_threadsafe 之间的 Python asyncio 区别
从事件线程外部将协程推送到事件线程的 pythonic 方法是什么？最佳答案更新信息: 从Python 3.7 高级函数asyncio.create_task(coro)开始was added并且
python-asyncio - 如何 asyncio.gather block 中的任务+使用具有 TCP 连接限制的信号量？
我有一个大型 (1M) 数据库结果集，我想为其每一行调用一个 REST API。 API 可以接受批处理请求，但我不确定如何分割 rows 生成器，以便每个任务处理一个行列表，比如 10。我宁愿不预先
python - 混合 asyncio 和 Kivy : How to start the asyncio loop and the Kivy application at the same time?
迷失在异步中。我同时在学习Kivy和asyncio，卡在了解决运行Kivy和运行asyncio循环的问题上，无论怎么转，都是阻塞调用，需要顺序执行(好吧，我希望我是错的)，例如 loop = asy
python - asyncio python 3.6 代码到 asyncio python 3.4 代码
我有这个 3.6 异步代码: async def send(command,userPath,token): async with websockets.connect('wss://127.
python - 使用 asyncio.wait_for 和 asyncio.Semaphore 时如何正确捕获 concurrent.futures._base.TimeoutError ？
首先，我需要警告你:我是 asyncio 的新手，而且我是我马上警告你，我是 asyncio 的新手，我很难想象引擎盖下的库里有什么。这是我的代码: import asyncio semaphor
python - 当 asyncio.PriorityQueue 处于 maxsize 并且我 put() 新项目时，如何将项目从 asyncio.PriorityQueue 中推出？
我有一个asyncio.PriorityQueue，用作网络爬虫的URL队列，当我调用url_queue.get时，得分最低的URL首先从队列中删除()。当队列达到 maxsize 项时，默认行为是阻
python - 在具有 asyncio.coroutine 方法的类外部声明的 asyncio event_loop 失败并显示 "AttributeError: ' NoneType' 对象没有属性 'select'“
探索 Python 3.4.0 的 asyncio 模块，我试图创建一个类，其中包含从类外部的 event_loop 调用的 asyncio.coroutine 方法。我的工作代码如下。 impor
python-3.5 - python 3 asyncio : coroutines execution order using run_until_complete(asyncio. 等待(corutines_list))
我有一个可能是无用的问题，但尽管如此，我还是觉得我错过了一些对于理解 asyncio 的工作方式可能很重要的东西。我刚刚开始熟悉 asyncio 并编写了这段非常基本的代码: import asyn

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - Python asyncio:按顺序完成