gpt4 book ai didi

Python dataframes - 如何在此处应用线程/多重处理来加快速度

转载 作者:行者123 更新时间:2023-12-01 07:14:50 26 4
gpt4 key购买 nike

我有一个包含数百万行的 DataFrame,我必须对每行的 col_1col_2 执行函数。请参阅下面的示例。假设每个函数需要 2 秒,我有 3 行,所以目前需要 6 秒。我想在这里使用线程,将时间减少到2秒。我该怎么办?

import pandas as pd
import time

def add(a,b):
sum = a+b
time.sleep(2) #just to show that in reality my function takes times
print("sum of %d and %d is %d" %(a, b, sum))

data = [[10,10],[9,12],[100,13]]
df = pd.DataFrame(data,columns=['col_1','col_2'])

start_time = time.time()
df.apply(lambda x: add(x.col_1, x.col_2), axis=1)
print("--- %s seconds ---" % (time.time() - start_time))

最佳答案

好的,谢谢大家。异步方法有效。

import pandas as pd
import time
from multiprocessing.dummy import Pool

pool_size = 5

pool = Pool(pool_size)


def add(a,b):
sum = a+b
time.sleep(2) #just to show that in reality my function takes times
print("sum of %d and %d is %d" %(a, b, sum))

data = [[10,10],[9,12],[100,13]]
df = pd.DataFrame(data,columns=['col_1','col_2'])

start_time = time.time()
for ind in df.index:
pool.apply_async(add, args=(df['col_1'][ind], df['col_2'][ind],))
pool.close()
pool.join()



print("--- %s seconds ---" % (time.time() - start_time))

关于Python dataframes - 如何在此处应用线程/多重处理来加快速度,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58027947/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com