gpt4 book ai didi

python - 加速 Pandas 中的多循环数据计算

转载 作者:太空宇宙 更新时间:2023-11-04 03:12:28 26 4
gpt4 key购买 nike

这是我的问题。以下面的数据框为例:

enter image description here

  • 数据框df有 8 列,每列都有有限值。
  • 我要做什么:
    • 一个。按
    • 循环数据框
    • b。在每一行中,B1B2B3B4B5 , B6 会变成B* x A

代码如下:

 for i in range(0,len(df),1):
col_B = ["B1","B2","B3","B4","B5","B6",]
for j in range(len(col_B)):
df.[col_B[j]].iloc[i] = df.[col_B[j]].iloc[i]*df.A.iloc[i]

在我包含 224 行和 9 列的真实数据中,遍历所有这些单元格花费了我 0:01:03

如何提高 Pandas 的循环速度?

如有任何建议,我们将不胜感激。

最佳答案

可以先filter DataFrame 然后乘以 mul :

print(df.filter(like='B').mul(df.A, axis=0))

示例:

import pandas as pd
import numpy as np

df = pd.DataFrame({'A':[1,2,3],
'B1':[4,5,6],
'B2':[7,8,9],
'B3':[1,3,5],
'B4':[5,3,6],
'B5':[7,4,3],
'B6':[1,3,7]})

print (df)
A B1 B2 B3 B4 B5 B6
0 1 4 7 1 5 7 1
1 2 5 8 3 3 4 3
2 3 6 9 5 6 3 7

print(df.filter(like='B').mul(df.A, axis=0))
B1 B2 B3 B4 B5 B6
0 4 7 1 5 7 1
1 10 16 6 6 8 6
2 18 27 15 18 9 21

如果需要列 A 使用 concat :

print (pd.concat([df.A, df.filter(like='B').mul(df.A, axis=0)], axis=1))
A B1 B2 B3 B4 B5 B6
0 1 4 7 1 5 7 1
1 2 10 16 6 6 8 6
2 3 18 27 15 18 9 21

时间:

len(df)=3:

In [416]: %timeit (pd.concat([df.A, df.filter(like='B').mul(df.A, axis=0)], axis=1))
1000 loops, best of 3: 1.01 ms per loop

In [417]: %timeit loop(df)
100 loops, best of 3: 3.28 ms per loop

len(df)=30k:

In [420]: %timeit (pd.concat([df.A, df.filter(like='B').mul(df.A, axis=0)], axis=1))
The slowest run took 4.00 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 3 ms per loop

In [421]: %timeit loop(df)
1 loop, best of 3: 35.6 s per loop

计时代码:

import pandas as pd

df = pd.DataFrame({'A':[1,2,3],
'B1':[4,5,6],
'B2':[7,8,9],
'B3':[1,3,5],
'B4':[5,3,6],
'B5':[7,4,3],
'B6':[1,3,7]})

print (df)

df = pd.concat([df]*10000).reset_index(drop=True)

print (pd.concat([df.A, df.filter(like='B').mul(df.A, axis=0)], axis=1))

def loop(df):
for i in range(0,len(df),1):
col_B = ["B1","B2","B3","B4","B5","B6",]
for j in range(len(col_B)):
df[col_B[j]].iloc[i] = df[col_B[j]].iloc[i]*df.A.iloc[i]
return df

print (loop(df))

关于python - 加速 Pandas 中的多循环数据计算,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37538631/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com