gpt4 book ai didi

python - `nth` 破坏了 pandas 中排序的数据框

转载 作者:太空宇宙 更新时间:2023-11-04 10:05:39 26 4
gpt4 key购买 nike

我有以下数据框 census_df,其中包含美国的人口数据:

         STNAME             CTYNAME  CENSUS2010POP
0 Alabama Autauga County 54571
1 Alabama Baldwin County 182265
2 Alabama Barbour County 27457
3 Alabama Bibb County 22915
4 Alabama Blount County 57322
5 Alabama Bullock County 10914
6 Alabama Butler County 20947
7 Alabama Calhoun County 118572
8 Alabama Chambers County 34215
9 Alabama Cherokee County 25989
10 Alabama Chilton County 43643
11 Alabama Choctaw County 13859
12 Alabama Clarke County 25833
13 Alabama Clay County 13932
14 Alabama Cleburne County 14972
15 Alabama Coffee County 49948
16 Alabama Colbert County 54428
17 Alabama Conecuh County 13228
18 Alabama Coosa County 11539
19 Alabama Covington County 37765
20 Alabama Crenshaw County 13906
21 Alabama Cullman County 80406
22 Alabama Dale County 50251
23 Alabama Dallas County 43820
24 Alabama DeKalb County 71109
25 Alabama Elmore County 79303
26 Alabama Escambia County 38319
27 Alabama Etowah County 104430
28 Alabama Fayette County 17241
29 Alabama Franklin County 31704
... ... ... ...
3112 Wisconsin Washburn County 15911
3113 Wisconsin Washington County 131887
3114 Wisconsin Waukesha County 389891
3115 Wisconsin Waupaca County 52410
3116 Wisconsin Waushara County 24496
3117 Wisconsin Winnebago County 166994
3118 Wisconsin Wood County 74749
3119 Wyoming Albany County 36299
3120 Wyoming Big Horn County 11668
3121 Wyoming Campbell County 46133
3122 Wyoming Carbon County 15885
3123 Wyoming Converse County 13833
3124 Wyoming Crook County 7083
3125 Wyoming Fremont County 40123
3126 Wyoming Goshen County 13249
3127 Wyoming Hot Springs County 4812
3128 Wyoming Johnson County 8569
3129 Wyoming Laramie County 91738
3130 Wyoming Lincoln County 18106
3131 Wyoming Natrona County 75450
3132 Wyoming Niobrara County 2484
3133 Wyoming Park County 28205
3134 Wyoming Platte County 8667
3135 Wyoming Sheridan County 29116
3136 Wyoming Sublette County 10247
3137 Wyoming Sweetwater County 43806
3138 Wyoming Teton County 21294
3139 Wyoming Uinta County 21118
3140 Wyoming Washakie County 8533
3141 Wyoming Weston County 7208

[3142 rows x 3 columns]

这些列代表州名、县名和人口。现在,我试图找出每个州人口最多的三个县,然后我想对他们的人口求和,这样我就可以得到每个州的数字。为了获得每个州人口最多的县,我尝试了以下操作:

'''Sort all the counties according to their population'''
census_df = census_df.sort_values(by = 'CENSUS2010POP', ascending = False).reset_index(drop = True)

'''Group counties according to their states and choose first 3 members from each state'''
group = census_df.groupby('STNAME').nth([0, 1, 2])
print(group.tail())

这给了我以下信息(我只显示了最后几个值):

           CENSUS2010POP          CTYNAME
STNAME
Wisconsin 488073 Dane County
Wisconsin 389891 Waukesha County
Wyoming 91738 Laramie County
Wyoming 46133 Campbell County
Wyoming 75450 Natrona County

如您所见,对于最后一个州 Wyoming,在使用 nth 后,各州根据人口的排序被打乱了。许多其他州都会发生这种情况。有人可以告诉我发生了什么,以及如何在选择前三个值时保持排序后的值不变?

最佳答案

您可以使用 groupbySeriesGroupBy.nlargest什么比 .sort_values(ascending=False).head(n) 更快:

print (census_df.set_index('CTYNAME')
.groupby('STNAME')['CENSUS2010POP']
.nlargest(3)
.sort_index(ascending=False)
.reset_index())

STNAME CTYNAME CENSUS2010POP
0 Wyoming Natrona County 75450
1 Wyoming Laramie County 91738
2 Wyoming Campbell County 46133
3 Wisconsin Winnebago County 166994
4 Wisconsin Waukesha County 389891
5 Wisconsin Washington County 131887
6 Alabama Etowah County 104430
7 Alabama Calhoun County 118572
8 Alabama Baldwin County 182265

3 最高值的总和:

print (census_df.set_index('CTYNAME')
.groupby('STNAME')['CENSUS2010POP']
.apply(lambda x: x.nlargest(3).sum())
.sort_index(ascending=False)
.reset_index())

STNAME CENSUS2010POP
0 Wyoming 213321
1 Wisconsin 688772
2 Alabama 405267

关于python - `nth` 破坏了 pandas 中排序的数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41161989/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com