gpt4 book ai didi

python - 多索引 Pandas 数据框上的 cumsum

转载 作者:行者123 更新时间:2023-12-02 15:56:08 25 4
gpt4 key购买 nike

我有下面给出的多索引数据集:

arr = np.array([12, 12, 12, 72, 72, 72, 26, 26, 26, 22, 22, 22, 46, 46, 46, 32, 32, 32])
df = pd.DataFrame({'date': ['1/1/2000', '1/1/2000', '1/1/2000',
'2/1/2000', '2/1/2000', '2/1/2000',
'3/1/2000', '3/1/2000', '3/1/2000',
'1/1/2000', '1/1/2000', '1/1/2000',
'2/1/2000', '2/1/2000', '2/1/2000',
'3/1/2000', '3/1/2000', '3/1/2000'],
'type': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A',
'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B'],
'lags': ['31/12/1999', '30/12/1999', '29/12/1999',
'1/1/2000', '31/12/1999', '30/12/1999',
'2/1/2000', '1/1/2000', '31/12/1999',
'31/12/1999', '30/12/1999', '29/12/1999',
'1/1/2000', '31/12/1999', '30/12/1999',
'2/1/2000', '1/1/2000', '31/12/1999']})
df["target"] = arr
df.set_index(['date', 'type', 'lags'], inplace=True)

我正在尝试对每种类型使用 cumsum 并运行代码:

for g_name, g_df in df.groupby("type"):
df.loc[g_df.index, 'target'] = df.loc[g_df.index, 'target'].cumsum().astype('float64')

但是,它对每一行应用 cumsum 并输出结果:

                          target
date type lags
1/1/2000 A 31/12/1999 12.0
30/12/1999 24.0
29/12/1999 36.0
2/1/2000 A 1/1/2000 108.0
31/12/1999 180.0
30/12/1999 252.0
3/1/2000 A 2/1/2000 278.0
1/1/2000 304.0
31/12/1999 330.0
1/1/2000 B 31/12/1999 22.0
30/12/1999 44.0
29/12/1999 66.0
2/1/2000 B 1/1/2000 112.0
31/12/1999 158.0
30/12/1999 204.0
3/1/2000 B 2/1/2000 236.0
1/1/2000 268.0
31/12/1999 300.0

预期的结果是:

                          target
date type lags
1/1/2000 A 31/12/1999 12.0
30/12/1999 12.0
29/12/1999 12.0
2/1/2000 A 1/1/2000 84.0
31/12/1999 84.0
30/12/1999 84.0
3/1/2000 A 2/1/2000 110.0
1/1/2000 110.0
31/12/1999 110.0
1/1/2000 B 31/12/1999 22.0
30/12/1999 22.0
29/12/1999 22.0
2/1/2000 B 1/1/2000 68.0
31/12/1999 68.0
30/12/1999 68.0
3/1/2000 B 2/1/2000 100.0
1/1/2000 100.0
31/12/1999 100.0

如何获得预期结果并以 pythonic 方式更新原始数据?

最佳答案

您可以按datetype 分组并提取第一个值,然后按type 分组以计算累计和。最后重新索引输出以广播所有值:

>>> df.groupby(['date', 'type']).first().groupby('type').cumsum().reindex(df.index)
target
date type lags
1/1/2000 A 31/12/1999 12
30/12/1999 12
29/12/1999 12
2/1/2000 A 1/1/2000 84
31/12/1999 84
30/12/1999 84
3/1/2000 A 2/1/2000 110
1/1/2000 110
31/12/1999 110
1/1/2000 B 31/12/1999 22
30/12/1999 22
29/12/1999 22
2/1/2000 B 1/1/2000 68
31/12/1999 68
30/12/1999 68
3/1/2000 B 2/1/2000 100
1/1/2000 100
31/12/1999 100

关于python - 多索引 Pandas 数据框上的 cumsum,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/71503337/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com