python - 有效整合concurrent.futures并行执行的结果？-6ren

python - 有效整合concurrent.futures并行执行的结果？

转载作者：行者123 更新时间：2023-12-01 08:51:46

26

4

我有一个大约 100M 行的 pandas 数据框。并行处理在多核机器上运行得很好，每个核的利用率都为 100%。但是，executor.map() 的结果是一个生成器，因此为了实际收集处理后的结果，我迭代该生成器。这是非常非常慢(几个小时)，部分原因是它是单核，部分原因是循环。事实上，它比 my_function()

中的实际处理慢得多

有更好的方法(也许是并发和/或矢量化)吗？

编辑:将 pandas 0.23.4(目前最新)与 Python 3.7.0 结合使用

import concurrent
import pandas as pd

df = pd.DataFrame({'col1': [], 'col2': [], 'col3': []})

with concurrent.futures.ProcessPoolExecutor() as executor:
    gen = executor.map(my_function, list_of_values, chunksize=1000)

# the following is single-threaded and also very slow
for x in gen:
    df = pd.concat([df, x])  # anything better than doing this?
return df

最佳答案

这是与您的案例相关的基准:https://stackoverflow.com/a/31713471/5588279

如您所见，多次 concat(append) 效率非常低。你应该只执行pd.concat(gen)。我相信底层实现将预先分配所有需要的内存。

就您而言，每次都会完成内存分配。

关于python - 有效整合concurrent.futures并行执行的结果？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/53068070/

26

4

0

文章推荐： scala - 如何在scala中获取当前日期，月份，年份

文章推荐： java - 软键盘上的 Enter 键按下时不会触发事件

concurrency - Knockout Concurrency 插件能否跟踪新添加或删除的行？
我正在尝试在我的项目中使用 Knockout Concurrency 插件，目前我正在摆弄示例代码，但我没有让它工作: https://github.com/AndersMalmgren/Knocko
concurrency - Grunt Concurrent 不能运行太多任务
我正在尝试使用 grunt 运行多个监视任务，但似乎无法运行。我正在使用 grunt concurrent，但它似乎只运行我指定的一部分任务，只是短暂停止。这是我的 gruntfile 的片段: c
concurrency - Centos 7 : Running "Grunt Serve" (concurrent) task 上的 Ionic "concurrent:server"错误
我有一个使用 Grunt 的 Ionic 项目，它是由 Yeoman 构建的。我设法将其配置为在运行 Fedora 22 的本地计算机上正常工作。目前，我正在尝试在 Centos 7 服务器实例上配
android - 无法下载 backport-util-concurrent.jar(backport-util-concurrent :backport-util-concurrent:3. 1):没有可用于离线模式的缓存版本
关闭。这个问题需要debugging details .它目前不接受答案。想改进这个问题？将问题更新为 on-topic对于堆栈溢出。 1年前关闭。 Improve this question Co
concurrency - `Go is a concurrent lang` 是什么意思？
Go is a concurrent lang 这是什么意思？这是否意味着它是 C/C++/Java.. 的替代品？最佳答案 A concurrent language是一种具有并发语言结构的语言
concurrency - 事件溯源 : concurrently creating conflicting events
我正在尝试使用 Kafka 实现一个事件溯源系统，但遇到了以下问题。在新用户注册期间，我想检查用户提供的用户名是否已被使用。但是，请考虑 2 个用户尝试同时注册提供相同用户名的情况。根据我对 ES
concurrency - 练习 : Web Crawler - concurrency not working
我正在完成 golang 之旅并进行最后的练习，将网络爬虫更改为并行爬行而不是重复爬行 (http://tour.golang.org/#73)。我只更改了抓取功能。 var used = m
ruby-on-rails - 无法激活 concurrent-ruby-ext-1.1.3，因为 concurrent-ruby-1.1.4 与 concurrent-ruby (= 1.1.3) 冲突 (Gem::ConflictError)
ruby 版本 2.5.3 当我输入 rails new upload_app 时，出现以下错误错误如下 Traceback (most recent call last): 39: fro
concurrency - 戈朗 : Producer/Consumer concurrency model but with serialized results
func main() { jobs := []Job{job1, job2, job3} numOfJobs := len(jobs) resultsChan := make
concurrency - Rust 异步等待 : check if any future in a list resolves to true concurrently?
我正在尝试在 Rust async-await(即将稳定)中同时(而不是按顺序)运行 futures 列表，直到它们中的任何一个解析为 true . 想象一下有一个 Vec ，以及为每个文件运行的 f
java - 使用 java.util.concurrent.Concurrent* 容器时使用 volatile 有什么好处？
当我看到这段代码时出现了问题: private static volatile ConcurrentHashMap cMap = null; static { cMap = new Concu
python - dcos客户端安装失败-import concurrent.futures ImportError : No module named concurrent. futures
刚在lab环境下安装dcos环境，在centos7 linux机器上尝试安装dcos客户端时得到 **[root@rmavmdock5 dcos]# bash install.sh . http://
java - scala.concurrent.forkjoin.ForkJoinPool 与 java.util.concurrent.ForkJoinPool
为什么要为 Scala fork ForkJoinPool？哪种实现方式和哪种情况更受欢迎？最佳答案 scala 库拥有自己的 ForkJoinPool 副本的明显原因是 scala 必须在 1.
objective-c - 揭秘 NSOperation : concurrent vs non-concurrent and async pattern
是的，我知道。关于 NSOperation 世界有很多问题和答案，但我仍然有一些疑问。我会尝试用两部分的问题来解释我的疑虑。它们相互关联。在 SO 帖子中 nsoperationqueue-and-
java - java.util.concurrent.Future 的 scala.concurrent.Future 包装器
我将 Play Framework 2.1.1 与一个生成 java.util.concurrent.Future 结果的外部 java 库一起使用。我使用的是 scala future 而不是 Ak
java - 从 dl.util.concurrent 迁移到 java.util.concurrent 的概述/教程
我们使用 Doug Lea 的并发库已有 8 年多了。出于向后兼容性的原因，我们的代码仅限于使用 Java 2 语言级别和 JDK 1.3 库。现在我们正在开发一个主要的新版本，并最终能够使用 Ja
concurrency - 特定的 NServiceBus Sagas : Concurrent Access to Saga Data Persisted in Azure Table Storage
此问题涉及当 saga 数据保留在 Azure 表存储中时对 saga 数据的并发访问。它也是在 Prefer 的文档中找到的引用信息:http://docs.particular.net/nserv
java - 我可以将 java.util.concurrent.locks.Lock 更改为 java.util.concurrent.locks.ReentrantReadWriteLock;
我有一个创建锁的方法。 ReadWriteLock lock = new ReentrantReadWriteLock(); 然后我使用 Lock Interface 将该对象传递到一个方法中。 m
python-3.x - 没有名为 'concurrent.futures' 的模块； 'concurrent' 不是使用 Python 3.6.0 的包
当我在 Mac OSX 命令行上的 python 中执行以下操作时: >>> from concurrent.futures import ProcessPoolExecutor 我明白了 Modul
Android java.util.concurrent.RejectedExecutionException : Task android. os.AsyncTask 被 java.util.concurrent 拒绝
我正在 listview 的线程池上创建异步任务。我正在通过 asynchtask 的 listarray 处理这些任务。当 fragment 被销毁时我必须删除这些任务，并且当我在销毁最后一个 fr

首页

博学

6Ren·AI

商城

python - 有效整合concurrent.futures并行执行的结果？