gpt4 book ai didi

python - Pandas:为具有多列的数据框实现groupby +聚合的优雅方法?

转载 作者:太空宇宙 更新时间:2023-11-03 14:59:48 25 4
gpt4 key购买 nike

我有一个数据框my_df ,那么我想创建一个新的数据框 new_df 。每个new_df列由 groupby 创建my_id然后采取max my_df 中的一列.

下面是我的代码,它工作正常。不过,我想知道有没有更好的方法?尤其是在未来我将处理数百列而不是仅仅 6 列?非常感谢!

tmp_df1 = my_df.groupby(['my_id'], as_index=False).col_A.agg({"max_A": "max"})   
tmp_df2 = my_df.groupby(['my_id'], as_index=False).col_B.agg({"max_B": "max"})
tmp_df3 = my_df.groupby(['my_id'], as_index=False).col_C.agg({"max_C": "max"})
tmp_df4 = my_df.groupby(['my_id'], as_index=False).col_D.agg({"max_D": "max"})
tmp_df5 = my_df.groupby(['my_id'], as_index=False).col_E.agg({"max_E": "max"})
tmp_df6 = my_df.groupby(['my_id'], as_index=False).col_F.agg({"max_F": "max"})

combine_df1 = pd.merge(tmp_df1,tmp_df2,how="inner",on=['my_id'])
combine_df2 = pd.merge(combine_df1,tmp_df3,how="inner",on=['my_id'])
combine_df3 = pd.merge(combine_df2,tmp_df4,how="inner",on=['my_id'])
combine_df4 = pd.merge(combine_df3,tmp_df5,how="inner",on=['my_id'])
new_df = pd.merge(combine_df4,tmp_df6,how="inner",on=['my_id'])

最佳答案

考虑示例数据框my_df

np.random.seed([3,1415])

my_df = pd.DataFrame(dict(
my_id=list('WXYZ') * 4,
col_A=np.random.randint(10, size=16),
col_B=np.random.randint(10, size=16),
col_C=np.random.randint(10, size=16),
col_D=np.random.randint(10, size=16),
col_E=np.random.randint(10, size=16),
col_F=np.random.randint(10, size=16),
))

my_df

col_A col_B col_C col_D col_E col_F my_id
0 0 3 6 1 9 3 W
1 2 2 4 5 8 5 X
2 7 4 7 2 0 8 Y
3 3 3 6 8 4 5 Z
4 8 3 2 2 3 1 W
5 7 6 6 4 0 5 X
6 0 7 6 7 4 1 Y
7 6 7 5 6 1 4 Z
8 8 4 2 9 5 3 W
9 6 5 8 4 8 9 X
10 0 3 7 2 6 5 Y
11 2 7 5 4 0 5 Z
12 0 5 8 6 8 7 W
13 4 9 4 3 7 0 X
14 9 8 7 8 4 3 Y
15 7 7 6 3 6 2 Z

您只想使用groupbymax

new_df = my_df.groupby('my_id', as_index=False).max()
new_df

my_id col_A col_B col_C col_D col_E col_F
0 W 8 5 8 9 9 7
1 X 7 9 8 5 8 9
2 Y 9 8 7 8 6 8
3 Z 7 7 6 8 6 5

如果您想更改列的名称:

my_df.groupby('my_id').max() \
.rename(columns=lambda x: x.replace('col_', '')) \
.add_suffix('_max').reset_index()

my_id A_max B_max C_max D_max E_max F_max
0 W 8 5 8 9 9 7
1 X 7 9 8 5 8 9
2 Y 9 8 7 8 6 8
3 Z 7 7 6 8 6 5

或者

my_df.groupby('my_id').max() \
.rename(columns=lambda x: x.rsplit('_', 1)[1] + '_max').reset_index()

my_id A_max B_max C_max D_max E_max F_max
0 W 8 5 8 9 9 7
1 X 7 9 8 5 8 9
2 Y 9 8 7 8 6 8
3 Z 7 7 6 8 6 5

关于python - Pandas:为具有多列的数据框实现groupby +聚合的优雅方法?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45203475/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com