gpt4 book ai didi

python - 级联 groupby/转换操作

转载 作者:太空宇宙 更新时间:2023-11-04 11:08:29 24 4
gpt4 key购买 nike

假设我有一组包含日期和值的组和子组。

我最后需要的是按组按月评估值的滚动平均值,窗口 2(当前月份的值是使用过去 2 个月评估的)。

如果我将数据帧减少两个连续的 groupbys,我可以实现:

  • 第一个评估每组值的总和(groupby组|日期,总和)
  • 第二个评估每月滚动平均值分组(groupby group|month, transform, rolling)

但这会减少我的数据。

我需要的是使用转换操作来完成这一切,这样我就可以将结果作为原始数据帧上的一列。

让我们看一下这个虚拟数据:

values = [100, 100, 200, 200, 300, 300]
dates = ['2017-01-01', '2017-02-01',
'2018-01-01', '2018-02-01',
'2019-01-01', '2019-02-01']

df1 = pd.DataFrame({'date': dates, 'value': values})
df1['subgroup'] = 'subgroup1'

df2 = df1.copy()
df2['subgroup'] = 'subgroup2'
df2['value'] = df2.value *2

df_g1 = pd.concat([df1, df2], axis=0)

df_g1['group'] = 'group1'

df_g2 = df_g1.copy()
df_g2['group'] = 'group2'
df_g2['value'] = df_g2.value *2

df = pd.concat([df_g1, df_g2], axis=0)
df['date'] = pd.to_datetime(df.date)

现在第一个groupby操作:

df_total_by_group = df.groupby(['group', 'date'], as_index=False)[['value']].sum()
df_total_by_group['month'] = df_total_by_group['date'].dt.month

现在滚动平均值:

def rolling_mean(serie):
return serie.shift(1).rolling(2, min_periods=1).mean()

df_total_by_group['month_rolling_mean_by_group'] = (df_total_by_group
.groupby(['group', 'month'])['value']
.transform(rolling_mean)
)

# display results
df_total_by_group.sort_values(by=['group', 'month'])

enter image description here

我在这里得到了正确的结果,但我需要它们作为原始数据框中的一列。

我迷路了。有什么建议吗?

最佳答案

使用DataFrame.merge带有列列表 - 这里缺少 on,因为通过两个 DataFrame 的所有公共(public)列的交叉合并:

df = df.merge(df_total_by_group[['group','date','month_rolling_mean_by_group']], how='left')

所以它的工作原理是一样的:

df = df.merge(df_total_by_group[['group','date','month_rolling_mean_by_group']], 
how='left',
on=['group','date'])

print (df)
date value subgroup group month_rolling_mean_by_group
0 2017-01-01 100 subgroup1 group1 NaN
1 2017-01-01 200 subgroup2 group1 NaN
2 2017-02-01 100 subgroup1 group1 NaN
3 2017-02-01 200 subgroup2 group1 NaN
4 2018-01-01 200 subgroup1 group1 300.0
5 2018-01-01 400 subgroup2 group1 300.0
6 2018-02-01 200 subgroup1 group1 300.0
7 2018-02-01 400 subgroup2 group1 300.0
8 2019-01-01 300 subgroup1 group1 450.0
9 2019-01-01 600 subgroup2 group1 450.0
10 2019-02-01 300 subgroup1 group1 450.0
11 2019-02-01 600 subgroup2 group1 450.0
12 2017-01-01 200 subgroup1 group2 NaN
13 2017-01-01 400 subgroup2 group2 NaN
14 2017-02-01 200 subgroup1 group2 NaN
15 2017-02-01 400 subgroup2 group2 NaN
16 2018-01-01 400 subgroup1 group2 600.0
17 2018-01-01 800 subgroup2 group2 600.0
18 2018-02-01 400 subgroup1 group2 600.0
19 2018-02-01 800 subgroup2 group2 600.0
20 2019-01-01 600 subgroup1 group2 900.0
21 2019-01-01 1200 subgroup2 group2 900.0
22 2019-02-01 600 subgroup1 group2 900.0
23 2019-02-01 1200 subgroup2 group2 900.0

如果对第一个sum 使用transform,它的工作方式不同:

df['value'] = df.groupby(['group', 'date'], as_index=False)['value'].transform('sum')
df['month'] = df['date'].dt.month


def rolling_mean(serie):
return serie.shift(1).rolling(2, min_periods=1).mean()

df['month_rolling_mean_by_group'] = (df.groupby(['group', 'month'])['value']
.transform(rolling_mean))

print (df)
date value subgroup group month month_rolling_mean_by_group
0 2017-01-01 300 subgroup1 group1 1 NaN
1 2017-02-01 300 subgroup1 group1 2 NaN
2 2018-01-01 600 subgroup1 group1 1 300.0
3 2018-02-01 600 subgroup1 group1 2 300.0
4 2019-01-01 900 subgroup1 group1 1 450.0
5 2019-02-01 900 subgroup1 group1 2 450.0
0 2017-01-01 300 subgroup2 group1 1 750.0
1 2017-02-01 300 subgroup2 group1 2 750.0
2 2018-01-01 600 subgroup2 group1 1 600.0
3 2018-02-01 600 subgroup2 group1 2 600.0
4 2019-01-01 900 subgroup2 group1 1 450.0
5 2019-02-01 900 subgroup2 group1 2 450.0
0 2017-01-01 600 subgroup1 group2 1 NaN
1 2017-02-01 600 subgroup1 group2 2 NaN
2 2018-01-01 1200 subgroup1 group2 1 600.0
3 2018-02-01 1200 subgroup1 group2 2 600.0
4 2019-01-01 1800 subgroup1 group2 1 900.0
5 2019-02-01 1800 subgroup1 group2 2 900.0
0 2017-01-01 600 subgroup2 group2 1 1500.0
1 2017-02-01 600 subgroup2 group2 2 1500.0
2 2018-01-01 1200 subgroup2 group2 1 1200.0
3 2018-02-01 1200 subgroup2 group2 2 1200.0
4 2019-01-01 1800 subgroup2 group2 1 900.0
5 2019-02-01 1800 subgroup2 group2 2 900.0

关于python - 级联 groupby/转换操作,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58856701/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com