gpt4 book ai didi

python - Pandas 层次排序

转载 作者:太空宇宙 更新时间:2023-11-03 20:07:40 25 4
gpt4 key购买 nike

我有一个类别和金额的数据框。可以使用冒号分隔的字符串将类别无限嵌套到子类别中。我希望按金额降序排序。但以分层类型的方式如图所示。

我需要如何排序

CATEGORY                            AMOUNT
Transport 5000
Transport : Car 4900
Transport : Train 100
Household 1100
Household : Utilities 600
Household : Utilities : Water 400
Household : Utilities : Electric 200
Household : Cleaning 100
Household : Cleaning : Bathroom 75
Household : Cleaning : Kitchen 25
Household : Rent 400
Living 250
Living : Other 150
Living : Food 100

编辑:数据框:

pd.DataFrame({
"category": ["Transport", "Transport : Car", "Transport : Train", "Household", "Household : Utilities", "Household : Utilities : Water", "Household : Utilities : Electric", "Household : Cleaning", "Household : Cleaning : Bathroom", "Household : Cleaning : Kitchen", "Household : Rent", "Living", "Living : Other", "Living : Food"],
"amount": [5000, 4900, 100, 1100, 600, 400, 200, 100, 75, 25, 400, 250, 150, 100]
})

注意:这是我想要的顺序。排序之前可以是任意顺序。

编辑2:如果有人正在寻找类似的解决方案,我在这里发布了我确定的解决方案:How to sort dataframe in pandas by value in hierarchical category structure

最佳答案

一种方法可能是首先str.split类别列。

df_ = df['category'].str.split(' : ', expand=True)
print (df_.head())
0 1 2
0 Transport None None
1 Transport Car None
2 Transport Train None
3 Household None None
4 Household Utilities None

然后获取列金额,您想要的是根据以下条件获取每组的最大金额:

  • 仅第一列,
  • 然后是第一列和第二列
  • 然后是第一、第二和第三列,...

您可以使用 groupby.transformmax 来完成此操作,然后连接创建的每个列。

s = df['amount']
l_cols = list(df_.columns)
dfa = pd.concat([s.groupby([df_[col] for col in range(0, lv+1)]).transform('max')
for lv in l_cols], keys=l_cols, axis=1)
print (dfa)
0 1 2
0 5000 NaN NaN
1 5000 4900.0 NaN
2 5000 100.0 NaN
3 1100 NaN NaN
4 1100 600.0 NaN
5 1100 600.0 400.0
6 1100 600.0 200.0
7 1100 100.0 NaN
8 1100 100.0 75.0
9 1100 100.0 25.0
10 1100 400.0 NaN
11 250 NaN NaN
12 250 150.0 NaN
13 250 100.0 NaN

现在您只需按正确的顺序对所有列进行排序,首先是 0,然后是 1,然后是 2...,获取索引并使用 loc 按预期方式对 df 进行排序

dfa = dfa.sort_values(l_cols, na_position='first', ascending=False)
dfs = df.loc[dfa.index] #here you can reassign to df directly
print (dfs)
category amount
0 Transport 5000
1 Transport : Car 4900
2 Transport : Train 100
3 Household 1100
4 Household : Utilities 600
5 Household : Utilities : Water 400
6 Household : Utilities : Electric 200
10 Household : Rent 400 #here is the one difference with this data
7 Household : Cleaning 100
8 Household : Cleaning : Bathroom 75
9 Household : Cleaning : Kitchen 25
11 Living 250
12 Living : Other 150
13 Living : Food 100

关于python - Pandas 层次排序,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58888948/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com