gpt4 book ai didi

python - 合并重复行 Pandas 的某些行值

转载 作者:太空宇宙 更新时间:2023-11-04 02:28:51 25 4
gpt4 key购买 nike

我有一个基于足球运动员的数据框。当一名球员在赛季中期转会时,我发现了重复的行。我的目标是将两个联赛中累积的积分相加,并将它们加在一起只排成一行。

这是一个数据示例:

name    full_name   club    Points  Start   Sub
84 S. Mustafi Shkodran Mustafi Arsenal 76 26 1
85 S. Mustafi Shkodran Mustafi Arsenal -2 0 1
89 Bruno Bruno Soriano Llido Villarreal CF 43 15 16
90 Bruno Bruno Gonzalez Cabrera Getafe CF 43 15 16
119 Oscar Oscar dos Santos Emboaba NaN 16 5 8
120 Oscar Oscar dos Santos Emboaba NaN 1 0 2
121 Oscar Oscar Rodriguez Arnaiz Real Madrid CF 16 5 8
122 Oscar Oscar Rodriguez Arnaiz Real Madrid CF 1 0 2
188 C. Bravo Claudio Bravo Manchester City 61 22 8
189 C. Bravo Claudio Bravo Manchester City 1 1 0
193 Naldo Ronaldo Aparecido Rodrigues FC Schalke 04 58 19 1
194 Naldo Edinaldo Gomes Pereira RCD Espanyol 58 19 1
200 G. Castro Gonzalo Castro Borussia Dortmund 79 23 6
201 G. Castro Gonzalo Castro Malaga CF 79 23 6
209 Juanfran Juan Francisco Torres Belen Atletico Madrid 86 21 8
210 Juanfran Juan Francisco Torres Belen Atletico Madrid 74 34 2
211 Juanfran Juan Francisco Moreno Fuertes RC Coruna 86 21 8
212 Juanfran Juan Francisco Moreno Fuertes RC Coruna 74 34 2

我的目标数据框会将像 Mustafi 的 Points Start 和 Sum 值这样的球员加在一起,只给一个球员。像布鲁诺这样的球员显然不是同一个人,所以我不想把两个布鲁诺加在一起。

name    full_name   club    Points  Start   Sub
84 S. Mustafi Shkodran Mustafi Arsenal 74 26 2
89 Bruno Bruno Soriano Llido Villarreal CF 43 15 16
90 Bruno Bruno Gonzalez Cabrera Getafe CF 43 15 16
119 Oscar Oscar dos Santos Emboaba NaN 17 5 10
121 Oscar Oscar Rodriguez Arnaiz Real Madrid CF 17 5 10
188 C. Bravo Claudio Bravo Manchester City 62 23 8
193 Naldo Ronaldo Aparecido Rodrigues FC Schalke 04 58 19 1
194 Naldo Edinaldo Gomes Pereira RCD Espanyol 58 19 1
200 G. Castro Gonzalo Castro Borussia Dortmund 158 46 12
209 Juanfran Juan Francisco Torres Belen Atletico Madrid 86 21 8
212 Juanfran Juan Francisco Moreno Fuertes RC Coruna 74 34 2

任何帮助都会很棒!

最佳答案

你需要:

df[['name','full_name','club']] = df[['name','full_name','club']].fillna('')
d = {'Points':'sum', 'Start':'sum', 'Sub':'sum', 'club':'first'}
df = (df.groupby(['name','full_name'], sort=False, as_index=False)
.agg(d)
.reindex(columns=df.columns))

with pd.option_context('display.expand_frame_repr', False):
print (df)
name full_name club Points Start Sub
0 S. Mustafi Shkodran Mustafi Arsenal 74 26 2
1 Bruno Bruno SorianoLlido Villarreal CF 43 15 16
2 Bruno Bruno Gonzalez Cabrera Getafe CF 43 15 16
3 Oscar Oscar dos Santos Emboaba 17 5 10
4 Oscar Oscar Rodriguez Arnaiz Real Madrid CF 17 5 10
5 C. Bravo Claudio Bravo Manchester City 62 23 8
6 Naldo Ronaldo Aparecido Rodrigues FC Schalke 04 58 19 1
7 Naldo Edinaldo Gomes Pereira RCD Espanyol 58 19 1
8 G. Castro Gonzalo Castro Borussia Dortmund 158 46 12
9 Juanfran Juan Francisco Torres Belen Atletico Madrid 160 55 10
10 Juanfran Juan Francisco Moreno Fuertes RC Coruna 160 55 10

解释:

  1. 首先用fillnaNaNs替换为''为了避免在 groupby
  2. 中省略行
  3. groupby 汇总, agg with dictionary 指定列及其聚合函数
  4. 最后临时显示所有行使用with

关于python - 合并重复行 Pandas 的某些行值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49709405/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com