gpt4 book ai didi

python - Pandas 获取多列的排序索引顺序

转载 作者:行者123 更新时间:2023-12-01 03:29:26 24 4
gpt4 key购买 nike

我有类似以下多索引 Pandas 系列的内容,其中值按团队、年份和性别进行索引。

>>> import pandas as pd
>>> import numpy as np
>>> multi_index=pd.MultiIndex.from_product([['Team A','Team B', 'Team C', 'Team D'],[2015,2016],['Male','Female']], names = ['Team','Year','Gender'])
>>> np.random.seed(0)
>>> df=pd.Series(index=multi_index, data=np.random.randint(1, 10, 16))
>>> df
>>>
Team Year Gender
Team A 2015 Male 6
Female 1
2016 Male 4
Female 4
Team B 2015 Male 8
Female 4
2016 Male 6
Female 3
Team C 2015 Male 5
Female 8
2016 Male 7
Female 9
Team D 2015 Male 9
Female 2
2016 Male 7
Female 8

我的目标是获取 4 个年份/性别组合中每个团队排名顺序的数据框(男性 2015 年、男性 2016 年、女性 2015 年和女性 2016 年)。

我的方法是首先拆开数据帧,以便它由团队索引......

>>> unstacked_df = df.unstack(['Year','Gender'])
>>> print unstacked_df
>>>
>>>
Year 2015 2016
Gender Male Female Male Female
Team
Team A 6 1 4 4
Team B 8 4 6 3
Team C 5 8 7 9
Team D 9 2 7 8

然后通过循环遍历这 4 列并对每一列进行排序,根据索引顺序创建一个数据框...

>>> team_orders = np.array([unstacked_df.sort_values(x).index.tolist() for x in unstacked_df.columns]).T
>>> result = pd.DataFrame(team_orders, columns=unstacked_df.columns)
>>> print result
Year 2015 2016
Gender Male Female Male Female
0 Team C Team A Team A Team B
1 Team A Team D Team B Team A
2 Team B Team B Team C Team D
3 Team D Team C Team D Team C

我缺少一种更简单/更好的方法吗?

最佳答案

从非堆叠版本开始,您可以将 .argsort().apply() 结合使用来对每列进行排序,然后将其用作针对索引:

df.unstack([1,2]).apply(lambda x: x.index[x.argsort()]).reset_index(drop=True)

Year 2015 2016
Gender Male Female Male Female
0 Team C Team A Team A Team B
1 Team A Team D Team B Team A
2 Team B Team B Team C Team D
3 Team D Team C Team D Team C

编辑:这里有更多关于其工作原理的信息。只需使用 .argsort(),您就可以得到:

print df.unstack([1,2]).apply(lambda x: x.argsort())

Year 2015 2016
Gender Male Female Male Female
Team
Team A 2 0 0 1
Team B 0 3 1 0
Team C 1 1 2 3
Team D 3 2 3 2

查找位本质上只是对每一列执行以下操作:

df.unstack([1,2]).index[[2,0,1,3]]

Index([u'Team C', u'Team A', u'Team B', u'Team D'], dtype='object', name=u'Team')

并且 .reset_index() 删除了现在毫无意义的索引标签。

关于python - Pandas 获取多列的排序索引顺序,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41092836/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com