gpt4 book ai didi

python - 将 MultiIndex 的级别重新索引为 Pandas 中的任意顺序

转载 作者:太空狗 更新时间:2023-10-29 17:32:51 24 4
gpt4 key购买 nike

我有一些代码总结了一个包含著名的泰坦尼克号数据集的 DataFrame,如下所示:

titanic['agecat'] = pd.cut(titanic.age, [0, 13, 20, 64, 100], 
labels=['child', 'adolescent', 'adult', 'senior'])
titanic.groupby(['agecat', 'pclass','sex']
)['survived'].mean()

这会根据 groupby 调用生成以下带有 MultiIndex 的 DataFrame:

agecat      pclass  sex   
adolescent 1 female 1.000000
male 0.200000
2 female 0.923077
male 0.117647
3 female 0.542857
male 0.125000
adult 1 female 0.965517
male 0.343284
2 female 0.868421
male 0.078125
3 female 0.441860
male 0.159184
child 1 female 0.000000
male 1.000000
2 female 1.000000
male 1.000000
3 female 0.483871
male 0.324324
senior 1 female 1.000000
male 0.142857
2 male 0.000000
3 male 0.000000
Name: survived, dtype: float64

但是,我希望 MultiIndex 的 agecat 级别自然排序,而不是按字母顺序排序,即:['child', 'adolescent', 'adult', '高级']。但是,如果我尝试使用 reindex 来执行此操作:

titanic.groupby(['agecat', 'pclass','sex'])['survived'].mean().reindex(
['child', 'adolescent', 'adult', 'senior'], level='agecat')

它对结果 DataFrame 的 MultiIndex 没有任何影响。这应该有效,还是我使用了错误的方法?

最佳答案

你需要提供一个重新排序的MultiIndex

In [36]: index = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'],
['one', 'two', 'three']],
labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3],
[0, 1, 2, 0, 1, 1, 2, 0, 1, 2]],
names=['first', 'second'])

In [37]: df = DataFrame(np.random.randn(10, 3), index=index,
columns=Index(['A', 'B', 'C'], name='exp'))

In [38]: df
Out[38]:
exp A B C
first second
foo one -1.007742 2.594146 1.211697
two 1.280218 0.799940 0.039380
three -0.501615 -0.136437 0.997753
bar one -0.201222 0.060552 0.480552
two -0.758227 0.457597 -0.648014
baz two -0.326620 1.046366 -2.047380
three 0.395894 1.128850 -1.126649
qux one -0.353886 -1.200079 0.493888
two -0.124532 0.114733 1.991793
three -1.042094 1.079344 -0.153037

通过在第二层进行排序来模拟重新排序

In [39]: idx = df.sortlevel(level='second').index

In [40]: idx
Out[40]:
MultiIndex
[(u'foo', u'one'), (u'bar', u'one'), (u'qux', u'one'), (u'foo', u'two'), (u'bar', u'two'), (u'baz', u'two'), (u'qux', u'two'), (u'foo', u'three'), (u'baz', u'three'), (u'qux', u'three')]

In [41]: df.reindex(idx)
Out[41]:
exp A B C
first second
foo one -1.007742 2.594146 1.211697
bar one -0.201222 0.060552 0.480552
qux one -0.353886 -1.200079 0.493888
foo two 1.280218 0.799940 0.039380
bar two -0.758227 0.457597 -0.648014
baz two -0.326620 1.046366 -2.047380
qux two -0.124532 0.114733 1.991793
foo three -0.501615 -0.136437 0.997753
baz three 0.395894 1.128850 -1.126649
qux three -1.042094 1.079344 -0.153037

不同的顺序

In [42]: idx = idx[5:] + idx[:5]

In [43]: idx
Out[43]:
MultiIndex
[(u'bar', u'one'), (u'bar', u'two'), (u'baz', u'three'), (u'baz', u'two'), (u'foo', u'one'), (u'foo', u'three'), (u'foo', u'two'), (u'qux', u'one'), (u'qux', u'three'), (u'qux', u'two')]

In [44]: df.reindex(idx)
Out[44]:
exp A B C
first second
bar one -0.201222 0.060552 0.480552
two -0.758227 0.457597 -0.648014
baz three 0.395894 1.128850 -1.126649
two -0.326620 1.046366 -2.047380
foo one -1.007742 2.594146 1.211697
three -0.501615 -0.136437 0.997753
two 1.280218 0.799940 0.039380
qux one -0.353886 -1.200079 0.493888
three -1.042094 1.079344 -0.153037
two -0.124532 0.114733 1.991793

关于python - 将 MultiIndex 的级别重新索引为 Pandas 中的任意顺序,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19037159/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com