gpt4 book ai didi

python - 如何在 pandas 中通过分组然后聚合创建自定义列

转载 作者:行者123 更新时间:2023-11-30 22:11:42 25 4
gpt4 key购买 nike

我有以下格式的DataFrame

| User          | CodeID        | Language |
| ------------- |---------------| -------- |
| foo | 1 | C |
| foo | 2 | C |
| foo | 3 | CPP |
| bar | 4 | C |
| bar | 5 | CPP |
| bar | 6 | Java |
| bar | 7 | CPP |

现在我想要的是计算每个用户每种语言的代码数量。也就是说,我想要一个以下格式的 DataFrame

| User | C  | CPP | Java | Total |
| ---- | -- | --- | ---- | ----- |
| foo | 2 | 1 | 0 | 3 |
| bar | 1 | 2 | 1 | 4 |

Point to be noted that the number of languages is dynamic. However, it is okay if someone can provide a solution with a fixed set of languages. Thanks in advance.

最佳答案

您可以使用交叉表并计算总计

In [223]: pd.crosstab(df.User, df.Language).assign(Total=lambda x: x.sum(axis=1))
Out[223]:
Language C CPP Java Total
User
bar 1 2 1 4
foo 2 1 0 3

或者

In [247]: df.pivot_table(index='User', columns='Language', values='CodeID', 
aggfunc=len).assign(Total=lambda x: x.sum(1))
Out[247]:
Language C CPP Java Total
User
bar 1.0 2.0 1.0 4.0
foo 2.0 1.0 NaN 3.0

或者

In [250]: df.groupby(['User', 'Language']).size().unstack(fill_value=0)
Out[250]:
Language C CPP Java
User
bar 1 2 1
foo 2 1 0

关于python - 如何在 pandas 中通过分组然后聚合创建自定义列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51347059/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com