gpt4 book ai didi

python - 在 Pandas 中移动 Groupby

转载 作者:行者123 更新时间:2023-12-01 02:27:44 25 4
gpt4 key购买 nike

我正在尝试按帐户计算累计收入。以下是一些示例数据:

import pandas as pd
data = {
'account_id': ['111','111','111','222','222','333','333','333','666','666'],
'company': ['initech','initech','initech','jackson steinem & co','jackson steinem & co','ingen','ingen','ingen','enron','enron'],
'cohort_period': [0,1,2,0,1,0,1,2,0,1],
'revenue':[3.67,9.95,9.95,193.29,299.95,83.03,499.95,99.95,1.52,19.95]
}
df = pd.DataFrame(data)

哪些输出:

In [17]: df
Out[17]:
account_id cohort_period company revenue
0 111 0 initech 3.67
1 111 1 initech 9.95
2 111 2 initech 9.95
3 222 0 jackson steinem & co 193.29
4 222 1 jackson steinem & co 299.95
5 333 0 ingen 83.03
6 333 1 ingen 499.95
7 333 2 ingen 99.95
8 666 0 enron 1.52
9 666 1 enron 19.95

有大量关于如何执行此操作的示例,基本上是:

df['cumulative_revenue'] = df.groupby('account_id')['revenue'].cumsum()

但是,有一个问题:在此数据中,同类群组第 0 期期间的收入是按比例分配的,出于分析目的,我并不关心这一点。我需要的是从同类周期 1 开始累积总和。例如,Initech 的累积收入应如下所示:

0    nan
1 9.95
2 19.90

最佳答案

这是一种方法:

# check valid cohort_period
valid_cohort = df.cohort_period.ne(0)

# cumulative sum revenue where cohort_period is not equal to zero and mask otherwise as nan
df['cum_revenue'] = valid_cohort.mul(df.revenue).groupby(df.account_id).cumsum().where(valid_cohort)

print(df)
# account_id cohort_period company revenue cum_revenue
#0 111 0 initech 3.67 NaN
#1 111 1 initech 9.95 9.95
#2 111 2 initech 9.95 19.90
#3 222 0 jackson steinem & co 193.29 NaN
#4 222 1 jackson steinem & co 299.95 299.95
#5 333 0 ingen 83.03 NaN
#6 333 1 ingen 499.95 499.95
#7 333 2 ingen 99.95 599.90
#8 666 0 enron 1.52 NaN
#9 666 1 enron 19.95 19.95

关于python - 在 Pandas 中移动 Groupby,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47193386/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com