gpt4 book ai didi

python - 如何使用 pandas 创建具有条件累积和的新列?

转载 作者:太空宇宙 更新时间:2023-11-03 15:32:32 25 4
gpt4 key购买 nike

以下代码创建一个值为 -1、0 或 1 的随机数据帧:

df = pd.DataFrame(np.random.randint(-1,2,size=(100, 1)), columns=['val'])

print(df['val'].value_counts())

让我们看看它包含什么:

-1    36
0 35
1 29
Name: val, dtype: int64

然后,我尝试创建一个名为 mysum 的新列,其中包含遵循以下规则的累积条件总和:

  • 如果 val = 1 且 mysum >= 0,则 mysum = mysum + 1。
  • 如果 val = 1 且 mysum < 0,则 mysum = mysum + 2。

  • 如果 val = -1 且 mysum <= 0,则 mysum = mysum - 1。

  • 如果 val = -1 且 mysum > 0,则 mysum = mysum - 2

  • 如果 val = 0 且 mysum < 0,则 mysum = mysum + 1。

  • 如果 val = 0 且 mysum > 0,则 mysum = mysum - 1。

  • 如果 val = 0 且 mysum = 0,则 mysum = mysum。

所以恐怕事情并不那么简单:

df['mysum'] = df['val'].cumsum()

所以我尝试了以下方法:

df['mysum'] = 0

df['mysum'] = np.where((df['val'] == 1) & (df['mysum'].cumsum() >= 0), (df['mysum'].cumsum() + 1), df['mysum'].cumsum())
df['mysum'] = np.where((df['val'] == 1) & (df['mysum'].cumsum() < 0), (df['mysum'].cumsum() + 2), df['mysum'].cumsum())

df['mysum'] = np.where((df['val'] == -1) & (df['mysum'].cumsum() <= 0), (df['mysum'].cumsum() - 1), df['mysum'].cumsum())
df['mysum'] = np.where((df['val'] == -1) & (df['mysum'].cumsum() > 0), (df['mysum'].cumsum() - 2), df['mysum'].cumsum())

df['mysum'] = np.where((df['val'] == 0) & (df['mysum'].cumsum() > 0), (df['mysum'].cumsum() - 1), df['mysum'].cumsum())
df['mysum'] = np.where((df['val'] == 0) & (df['mysum'].cumsum() < 0), (df['mysum'].cumsum() + 1), df['mysum'].cumsum())


print(df['mysum'].value_counts())
print(df)

但是 mysum 列没有累积!

这是一个 fiddle ,您可以尝试:https://repl.it/FaXZ/8

最佳答案

更有效的解决方案,另请参阅generalized cumulative functions in NumPy/SciPy? :

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.randint(-1, 2, size=(100, 1)), columns=['val'])
def my_sum(acc,x):
if x == 0 and acc < 0:
return acc + 1
if x == 1 and acc < 0:
return acc + 2
if x == -1 and acc <= 0:
return acc - 1
if x == 0 and acc > 0:
return acc - 1
if x == -1 and acc > 0:
return acc - 2
if x == 1 and acc >= 0:
return acc + 1
if x == 0 and acc == 0:
return acc
u_my_sum = np.frompyfunc(my_sum, 2, 1)
df['mysum'] = u_my_sum.accumulate(df.val, dtype=np.object).astype(np.int64)
print(df)

关于python - 如何使用 pandas 创建具有条件累积和的新列?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42805632/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com