gpt4 book ai didi

python - 如何总结不同的groupby组合?

转载 作者:行者123 更新时间:2023-11-28 20:00:48 26 4
gpt4 key购买 nike

我正在编制一张按县划分的前 3 种农裁剪表。一些县有相同的裁剪品种,顺序相同。其他县有相同的裁剪品种,但顺序不同。

df1 = pd.DataFrame( { 
"County" : ["Harney", "Baker", "Wheeler", "Hood River", "Wasco" , "Morrow","Union","Lake"] ,
"Crop1" : ["grain", "melons", "melons", "apples", "pears", "raddish","pears","pears"],
"Crop2" : ["melons","grain","grain","melons","carrots","pears","carrots","carrots"],
"Crop3": ["apples","apples","apples","grain","raddish","carrots","raddish","raddish"],
"Total_pop": [2000,1500,3000,1500,2000,2500,2700,2000]} )

我可以对 Crop1、Crop2 和 Crop3 进行分组并获得 total_pop 的总和:

df1_grouped=df1.groupby(['Crop1',"Crop2","Crop3"])['Total_pop'].sum().reset_index()

这给出了特定农裁剪组合的总数:

df1_grouped
apples melons grain 1500
grain melons apples 2000
melons grain apples 4500
pears carrots raddish 6700
raddish pears carrots 2500

不过,我想要的是获得不同裁剪组合的总人口——无论列出的裁剪是裁剪 1、裁剪 2 还是裁剪 3。期望的结果是这样的:

apples  melons   grain    8000
pears carrots raddish 9200

感谢您的指导。

最佳答案

由于您的数据似乎保证每个国家/地区有 3 种独特的裁剪(“我正在按县编制一份前 3 种裁剪的表格。”),对值进行排序并返回就足够了。

import numpy as np

cols = ['Crop1', 'Crop2', 'Crop3']
df1[cols] = np.sort(df1[cols].to_numpy(), axis=1)

County Crop1 Crop2 Crop3 Total_pop
0 Harney apples grain melons 2000
1 Baker apples grain melons 1500
2 Wheeler apples grain melons 3000
3 Hood River apples grain melons 1500
4 Wasco carrots pears raddish 2000
5 Morrow carrots pears raddish 2500
6 Union carrots pears raddish 2700
7 Lake carrots pears raddish 2000

然后总结一下:

df1.groupby(cols).sum()

# Total_pop
#Crop1 Crop2 Crop3
#apples grain melons 8000
#carrots pears raddish 9200

好处是您可以避免 Series.apply.apply(axis=1)。对于较大的 DataFrames,性能差异很明显:

df1 = pd.concat([df1]*10000, ignore_index=True)

cols = ['Crop1', 'Crop2', 'Crop3']
%timeit df1[cols] = np.sort(df1[cols].to_numpy(), axis=1)
#36.1 ms ± 399 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

to_sum = ['Crop1', 'Crop2', 'Crop3']
%timeit df1[to_sum] = pd.DataFrame(df1.loc[:, to_sum].apply(set, axis=1).apply(list).values.tolist(), columns=to_sum)
#1.41 s ± 51.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

关于python - 如何总结不同的groupby组合?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54737348/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com