python - 为什么 asyncio 的 run_in_executor 在发出 HTTP 请求时提供的并行度如此之低？-6ren

python - 为什么 asyncio 的 run_in_executor 在发出 HTTP 请求时提供的并行度如此之低？

转载作者：行者123 更新时间：2023-11-28 17:03:27

我编写了一个基准实用程序来批量查询 REST 端点。它通过三种方式实现:

依次使用请求库，
同时使用请求库，但使用 loop.run_in_executor() 包装每个请求，
同时使用 aiohttp 库。

下面是不同批量大小的结果:

批量大小=16

       concur_times  seq_times  concur_aiohttp_times
count     50.000000  50.000000             50.000000
mean       0.123786   0.235883              0.087843
std        0.009733   0.018039              0.029977
min        0.108682   0.210515              0.071560
25%        0.118666   0.222436              0.075565
50%        0.121978   0.231876              0.080050
75%        0.125740   0.242939              0.086345
max        0.169194   0.283809              0.267874

批量大小=4

       concur_times  seq_times  concur_aiohttp_times
count     50.000000  50.000000             50.000000
mean       0.080764   0.091276              0.052807
std        0.008342   0.016509              0.033814
min        0.069541   0.078517              0.041993
25%        0.076142   0.082242              0.044563
50%        0.079046   0.085540              0.045735
75%        0.081645   0.092659              0.049428
max        0.111622   0.170785              0.281397

如结果所示，aiohttp 例程的并行性始终更高。更重要的是，对于小批量 (4)，使用 loop.run_in_executor(“concur_times”列)的第二种方法与顺序方法相比仅实现了 1/9 的加速。

这是为什么呢？我的代码有问题吗？我将其包括在下面。

我已经尝试将网络 IO 换成 sleep 和 asyncio.sleep 并且产生了方法 2 和 3 同样快且方法 1 为 batch_size 的预期结果倍慢。

代码:

import asyncio
import requests
from cytoolz.curried import *
import pandas as pd
from timeit import default_timer as now

url = 'https://jsonplaceholder.typicode.com/todos/'

def dl_todo_with_requests(session, n):
        response = session.get(url + str(n))
        assert(response.status_code == 200)
        text = response.text
        return text

dl_todo_with_requests = curry(dl_todo_with_requests)

def seq_dl(todos_to_get):
        with requests.Session() as session:
                todos = pipe(
                        todos_to_get,
                        map( dl_todo_with_requests(session) ),
                        list )
                return todos

get_todos_from_futures = lambda futures: \
        pipe( futures,
                map( lambda fut: fut.result() ),
                list
            )

async def concur_dl(todos_to_get):
        loop = asyncio.get_running_loop()
        with requests.Session() as session:
                completed_futures, _pending = await \
                        pipe(
                        todos_to_get,
                        map( lambda n:
                                loop.run_in_executor(
                                None,
                                lambda: dl_todo_with_requests(session, n)
                                )),
                        list,
                        asyncio.wait
                        );
                todos = get_todos_from_futures(completed_futures)
                return todos

import aiohttp
async def concur_dl_aiohttp(todos_to_get):
        async def dl(session, todo):
                async with session.get(url + str(todo)) as resp:
                        assert(resp.status == 200)
                        return resp.text()
        dl = curry(dl)
        async with aiohttp.ClientSession() as session:
                loop = asyncio.get_running_loop()
                unexecuted = pipe(
                        todos_to_get,
                        map( dl(session) ),
                        list )
                completed_futures, _pending = await asyncio.wait(unexecuted)
                todos = get_todos_from_futures(completed_futures)
                return todos


def check_todos_received(todos):
        assert(len(todos) == len(todos_to_get))
        todo_has_content = lambda todo: len(todo) > len('{}')
        assert(all(map(todo_has_content, todos)))
        return True

def measure_it(f):
        start = now();
        f()
        elapsed = now() - start
        return elapsed

inspect = lambda f, it: map(do(f), it)
inspect = curry(inspect)

def bench(n_iters=50,batch_size=4):
        todos_to_get = range(1,batch_size+1)
        seq_dl(todos_to_get)
        # heat caches, if any
        measure_seq = lambda: pipe(
                        seq_dl(todos_to_get),
                        inspect(check_todos_received) )
        measure_concur = lambda: pipe(
                        asyncio.run(concur_dl(todos_to_get)),
                        inspect(check_todos_received) )
        measure_concur_aiohttp = lambda: pipe(
                        asyncio.run(concur_dl_aiohttp(todos_to_get)),
                        inspect(check_todos_received) )
        do_the_bench = lambda dl_f, title: \
               pipe( range(n_iters),
                       inspect(
                               lambda n: \
                               print("doing %s/%s %s batch download" \
                                       % (n+1,n_iters,title))),
                        map(lambda _: measure_it(dl_f)),
                        list )
        concur_times = do_the_bench(measure_concur,'concurrent')
        concur_aiohttp_times = do_the_bench(measure_concur_aiohttp,'concurrent_aiohttp')
        seq_times = do_the_bench(measure_seq,'sequential')
        return dict(
                concur_times=concur_times,
                seq_times=seq_times,
                concur_aiohttp_times=concur_aiohttp_times)

基准测试是这样运行的:bench(n_iters=50,batch_size=4)。然后通过 lambda 输出传递输出:pandas.DataFrame(output).describe() 以生成表格。

最佳答案

asyncio 的 run_in_executor 的默认执行器是 ThreadPoolExecutor ，它使用 Python 线程。所以它也受到GIL的影响。，如 this 中所述线程。

在您的情况下，一次只有一个具有异步作业的线程运行，导致 aiohttp 显示出更好的性能。

关于python - 为什么 asyncio 的 run_in_executor 在发出 HTTP 请求时提供的并行度如此之低？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/52769321/

文章推荐： python - Pytest:如何确保首先调用某个固定装置

文章推荐： jquery - 一旦它到达容器 div，将导航栏背景更改为黑色

文章推荐： javascript - 使用 JS 和 CSS 转换切换 Flex 的展开和折叠

Python run_in_executor 忘记了？
我如何设置一个阻塞函数在执行器中运行，以一种结果无关紧要的方式，这样主线程不应该等待或被它减慢。老实说，我不确定这是否是正确的解决方案，我想要的只是将某种类型的处理队列与主进程分开，这样它就不会阻止
python-asyncio - run_in_executor 是否针对在带有协程的循环中运行进行了优化？
我觉得 run_in_executor() asyncio的方法图书馆属于loop目的。特别是，如果我选择以“通常”的方式在异步事件循环旁边运行第二个线程，会有什么不同，作者是 import thr
python - 多进程池与 asyncio.run_in_executor
概览我正在尝试并行化一个文本分类项目，该项目肯定需要很长时间才能完全串行运行。我已经尝试了这两种可能的变体，我相信它们的功能相似，并且对我在资源监视器中看到的每种结果很好奇。第一个解决方案我尝试
python - asyncio.run_in_executor 是否指定不明确？
我有一个服务器应用程序，当客户请求时，我会安排一些工作，比如 def work(): time.sleep(5) fut = asyncio.get_event_loop().run_in_e
python - 如何在 python run_in_executor 方法调用中捕获异常
如何在使用 run_in_executor 调用的 run_long_thing() 函数中引发异常？看起来像是被吞噬了一样。我不需要阻塞代码中函数的结果。它基本上是一个即发即忘的功能，但如果有任何异
python - 如何使用 ProcessPoolExecutor 优雅地终止 loop.run_in_executor？
如何终止 loop.run_in_executor与 ProcessPoolExecutor优雅？启动程序后不久，发送 SIGINT (ctrl + c)。 def blocking_task():
python - 为什么 aiohttp 比 run_in_executor 包装的请求慢？
全部! 我需要向 Web 服务发出大约 10,000 个请求，并且我希望得到 JSON 响应。由于请求是相互独立的，所以我想并行运行它们。我认为 aiohttp 可以帮助我解决这个问题。我编写了以下代
python-3.x - 为什么协程不能与 run_in_executor 一起使用？
我想运行一个使用协程和多线程请求 URL 的服务。但是我无法将协程传递给执行器中的工作人员。有关此问题的最小示例，请参阅下面的代码: import time import asyncio import
python - 如何将关键字参数添加到通过 ThreadPoolExecuter 和 run_in_executor 调用的方法？
我正在尝试在 concurrent.futures 的帮助下同时发送 POST 请求。由于某种原因，我无法设置自定义 header 。我要设置授权内容类型这是我到目前为止所取得的进展。 impo
python - 使用 run_in_executor 和 asyncio 时的超时处理
我正在使用 asyncio 来运行一段这样的阻塞代码: result = await loop.run_in_executor(None, long_running_function) 我的问题是:我
python - 将 args、kwargs 传递给 run_in_executor
我正在尝试将参数传递给 run_in_executor，如下所示: loop.run_in_executor(None, update_contacts, data={ 'em
python - asyncio 的 loop.run_in_executor 是线程安全的吗？
我正在试用 asyncio，并且必须将它与一些普通的多线程阻塞代码混合使用，因此我需要使用 run_in_exector 卸载执行。 asyncio docs warn that "most func
python - IOLoop.current().run_in_executor() 和 ThreadPoolExecutor().submit() 的区别
我对 Python Tornado 很陌生，并且一直在尝试启动一个新线程来运行一些 IO 阻塞代码，同时允许服务器继续处理新请求。我一直在阅读，但似乎仍然无法弄清楚这两个功能之间有什么区别？例如调用
python - 为什么 asyncio 的 run_in_executor 在发出 HTTP 请求时提供的并行度如此之低？
我编写了一个基准实用程序来批量查询 REST 端点。它通过三种方式实现: 依次使用请求库，同时使用请求库，但使用 loop.run_in_executor() 包装每个请求，同时使用 aiohtt
python - 类型错误 : can't pickle coroutine objects when i am using asyncio loop. run_in_executor()
我指的是this repo是为了让mmaction2 grad-cam demo从短视频离线推理适配到长视频在线推理。脚本如下所示: 注意:为了使这个脚本可以很容易地重现，我注释掉了一些需要很多依赖的
python - 游戏。 Windows 10。使用 ProcessPoolExecutor 在 loop.run_in_executor 之后创建额外的窗口
问题重现环境: 操作系统:Windows 10(主机) 中央处理器:8 python :3.6.6 游戏版本:1.9.4 “构建器”:cx_Freeze 版本 5.1.1 问题未重现的环境: 操作系统
python-3.x - 使用 Asyncio 的 Run_In_Executor 包装 Selenium 驱动程序(和其他阻塞调用)
我正在用 Python 试验我的第一个小型爬虫，我想使用 asyncio 同时获取多个网站。我已经编写了一个与 aiohttp 一起使用的函数，但是由于 aiohttp.request() 不执行 j
python - BaseEventLoop.run_in_executor() throws "unexpected keyword argument ' callback'"从 Python 3.5 开始
我正在通过默认的 asyncio 事件循环运行函数 provision_ec2_node() thread executor .该函数采用一些参数，我通过 functools.partial() 将这

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 为什么 asyncio 的 run_in_executor 在发出 HTTP 请求时提供的并行度如此之低？