gpt4 book ai didi

python - 如何添加包含行上聚合信息的列?

转载 作者:太空宇宙 更新时间:2023-11-03 13:58:14 24 4
gpt4 key购买 nike

我有以下数据框:

# Create a dataframe
raw_data = {'trial_num': ['1', '1', '2', '2', '3', '3'],
'area': ['first', 'second', 'first', 'second','first','second'],
'counts': [10, 25, 36, 2, 70, 10]}

df = pd.DataFrame(raw_data, columns = ['trial_num', 'area', 'counts'])

trial_num area count
0 1 first 10
1 1 second 25
2 2 first 36
3 2 second 2
4 3 first 70
5 3 second 10

我想添加一个新列“比例”,将每个计数表示为每个“区域”总数的比例。像这样:

       trial_num  area     count  total_count proportion
0 1 first 10 35 0.2857142857142857
1 1 second 25 35 0.7142857142857143
2 2 first 36 38 0.9473684210526315
3 2 second 2 38 0.05263157894736842
4 3 first 70 80 0.875
5 3 second 10 80 0.125

我只做到了这一点:

df.counts.groupby(df.trial_num).sum()

trial_num
1 35
2 38
3 80

有没有一种有效的方法可以在不破坏数据框的情况下做到这一点?请帮忙。

最佳答案

您可以除以divGroupBy.transform 创建的系列与原始 df 大小相同:

df['proportion'] = df['counts'].div(df.groupby(['trial_num'])['counts'].transform('sum'))

替代方案:map :

s = df.groupby(['trial_num'])['counts'].sum()
df['proportion'] = df['counts'].div(df['trial_num'].map(s))
<小时/>
print (df)
trial_num area counts proportion
0 1 first 10 0.285714
1 1 second 25 0.714286
2 2 first 36 0.947368
3 2 second 2 0.052632
4 3 first 70 0.875000
5 3 second 10 0.125000

关于python - 如何添加包含行上聚合信息的列?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49449259/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com