gpt4 book ai didi

python - 编写函数来计算 panda 中的行元素的最佳方法是什么?

转载 作者:太空宇宙 更新时间:2023-11-03 15:45:16 24 4
gpt4 key购买 nike

我有一个像这样的基表:

enter image description here

col1 是一列独立值,col2 是基于 Country 和 Type 组合的聚合。我想使用以下逻辑计算列 col3 到 col5:

  1. col3:col1中某个元素占col1总和的比例
  2. col4:col1中某元素与col2中对应元素的比值
  3. col5:col3 和 col4 中行元素乘积的自然指数

我写了一个类似下面的函数来实现这个:

def calculate(df):
for i in range(len(df)):
df['col3'].loc[i] = df['col1'].loc[i]/sum(df['col1'])
df['col4'].loc[i] = df['col1'].loc[i]/df['col2'].loc[i]
df['col5'].loc[i] = np.exp(df['col3'].loc[i]*df['col4'].loc[i])
return df

这个函数执行了,并给出了预期的结果,但笔记本也抛出了警告:

SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

我不确定我在这里写的功能是否最好。任何帮助,将不胜感激!谢谢。

最佳答案

我认为最好避免在 pandas 中使用 apply 和循环,因此使用矢量化解决方案会更好更快:

df = pd.DataFrame({'col1':[4,5,4,5,5,4],
'col2':[7,8,9,4,2,3],
'col3':[1,3,5,7,1,0],
'col4':[5,3,6,9,2,4],
'col5':[1,4,3,4,0,4]})

print (df)
col1 col2 col3 col4 col5
0 4 7 1 5 1
1 5 8 3 3 4
2 4 9 5 6 3
3 5 4 7 9 4
4 5 2 1 2 0
5 4 3 0 4 4

df['col3'] = df['col1']/(df['col1']).sum()
df['col4'] = df['col1']/df['col2']
df['col5'] = np.exp(df['col3']*df['col4'])
print (df)
col1 col2 col3 col4 col5
0 4 7 0.148148 0.571429 1.088343
1 5 8 0.185185 0.625000 1.122705
2 4 9 0.148148 0.444444 1.068060
3 5 4 0.185185 1.250000 1.260466
4 5 2 0.185185 2.500000 1.588774
5 4 3 0.148148 1.333333 1.218391

时间:

df = pd.DataFrame({'col1':[4,5,4,5,5,4],
'col2':[7,8,9,4,2,3],
'col3':[1,3,5,7,1,0],
'col4':[5,3,6,9,2,4],
'col5':[1,4,3,4,0,4]})

#print (df)

#6000 rows
df = pd.concat([df] * 1000, ignore_index=True)

In [211]: %%timeit
...: df['col3'] = df['col1']/(df['col1']).sum()
...: df['col4'] = df['col1']/df['col2']
...: df['col5'] = np.exp(df['col3']*df['col4'])
...:
1.49 ms ± 104 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

不幸的是,这个示例的循环解决方案真的很慢,因此仅在 60 行 DataFrame 中进行了测试:

#60 rows
df = pd.concat([df] * 10, ignore_index=True)

In [3]: %%timeit
...: (calculate(df))
...:
C:\Anaconda3\lib\site-packages\pandas\core\indexing.py:194: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self._setitem_with_indexer(indexer, value)
10.2 s ± 410 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

关于python - 编写函数来计算 panda 中的行元素的最佳方法是什么?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50317196/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com