gpt4 book ai didi

python - 如何在 Pandas 的数据透视表上应用带有条件的聚合函数?

转载 作者:行者123 更新时间:2023-12-01 00:03:31 25 4
gpt4 key购买 nike

我的数据框看起来“像”这样:

index   name     method     values
0. A estimated 4874
1. A counted 847
2. A estimated 1152
3. B estimated 276
4. B counted 6542
5. B counted 1152
6. B estimated 3346
7. C counted 7622
8. C estimated 26
...

我想要做的是将每个“名称”的“估计”值和“计数”值的总数相加。我尝试使用pivot_table(如这段代码中所示)来执行此操作,但我一次只能对其中一种方法执行此操作。有没有办法可以用相同的代码来实现这两种方法?

count = df.groupby(['name']).apply(lambda sub_df: sub_df\
.pivot_table(index=['method'], values=['values'],
aggfunc= {'values': lambda x: x[df.iloc[x.index['method']=='estimated'].sum()},
margins=True, margins_name == 'total_estimated')
count

我最终想要得到的是这样的:

index   name     method       values
0. A estimated 4874
1. A counted 847
2. A estimated 1152
3. A total_counted 847
4. A total_estimated 6026
5. B estimated 276
6. B counted 6542
7. B counted 1152
8. B estimated 3346
9. B total_counted 7694
10. B total_estimated 3622
11. C counted 7622
12. C estimated 26
13. C total_counted 7622
14. C total_estimated 26
...

最佳答案

使用DataFrame.pivot_table为了计数,我们可以将原始 DataFrame 与 DataFrame.stack 连接起来+ DataFrame.joinDataFrame.melt + DataFrame.merge :

#if index is a columns
#df = df = df.set_index('index')
new_df = (df.join(df.pivot_table(index = 'name',
columns = 'method',
values = 'values',
aggfunc = 'sum')
.add_prefix('total_')
.stack()
.rename('new_value'),
on = ['name','method'],how = 'outer')

.assign(values = lambda x: x['values'].fillna(x['new_value']))
.drop(columns = 'new_value')
.sort_values(['name','method'])
)
print(new_df)

#if index is a columns
#df = df = df.set_index('index')
new_df = (df.merge(df.pivot_table(index = 'name',
columns = 'method',
values = 'values',
aggfunc = 'sum')
.add_prefix('total_')
.T
.reset_index()
.melt('method',value_name = 'values'),
on = ['name','method'],how = 'outer')
.assign(values = lambda x: x['values_x'].fillna(x['values_y']))
.loc[:,df.columns]
.sort_values(['name','method'])
)
print(new_df)

输出

   name           method  values
2 A counted 847.0
0 A estimated 4874.0
1 A estimated 1152.0
9 A total_counted 847.0
10 A total_estimated 6026.0
5 B counted 6542.0
6 B counted 1152.0
3 B estimated 276.0
4 B estimated 3346.0
11 B total_counted 7694.0
12 B total_estimated 3622.0
7 C counted 7622.0
8 C estimated 26.0
13 C total_counted 7622.0
14 C total_estimated 26.0

但如果我是你,我会使用 DataFrame.add_suffix相反:

new_df = (df.join(df.pivot_table(index = 'name',
columns = 'method',
values = 'values',
aggfunc = 'sum')
.add_suffix('_total')
.stack()
.rename('new_value'),
on = ['name','method'],how = 'outer')

.assign(values = lambda x: x['values'].fillna(x['new_value']))
.drop(columns = 'new_value')
.sort_values(['name','method'])
)
print(new_df)

name method values
index
1.0 A counted 847.0
8.0 A counted_total 847.0
0.0 A estimated 4874.0
2.0 A estimated 1152.0
8.0 A estimated_total 6026.0
4.0 B counted 6542.0
5.0 B counted 1152.0
8.0 B counted_total 7694.0
3.0 B estimated 276.0
6.0 B estimated 3346.0
8.0 B estimated_total 3622.0
7.0 C counted 7622.0
8.0 C counted_total 7622.0
8.0 C estimated 26.0
8.0 C estimated_total 26.0

关于python - 如何在 Pandas 的数据透视表上应用带有条件的聚合函数?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60149233/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com