gpt4 book ai didi

python - 将部分数据框转换为 Pandas 中的 MultiIndex

转载 作者:太空宇宙 更新时间:2023-11-03 16:05:44 25 4
gpt4 key购买 nike

我有这种 XLS 格式的数据:

+--------+---------+-------------+---------------+---------+
| ID | Branch | Customer ID | Customer Name | Balance |
+--------+---------+-------------+---------------+---------+
| 111111 | Branch1 | 1 | Company A | 10 |
+--------+---------+-------------+---------------+---------+
| 222222 | Branch2 | 2 | Company B | 20 |
+--------+---------+-------------+---------------+---------+
| 111111 | Branch1 | 2 | Company B | 30 |
+--------+---------+-------------+---------------+---------+
| 222222 | Branch2 | 3 | Company C | 10 |
+--------+---------+-------------+---------------+---------+

我想用 Pandas 来处理它。 Pandas 会将其作为单张读取,但我想在这里使用 MultiIndex,例如

+--------+---------+-------------+---------------+---------+
| ID | Branch | Customer ID | Customer Name | Balance |
+--------+---------+-------------+---------------+---------+
| | | 1 | Company A | 10 |
+ 111111 + Branch1 +-------------+---------------+---------+
| | | 2 | Company B | 30 |
+--------+---------+-------------+---------------+---------+
| | | 2 | Company B | 20 |
+ 222222 + Branch2 +-------------+---------------+---------+
| | | 3 | Company C | 10 |
+--------+---------+-------------+---------------+---------+

这里111111Branch1是1级索引,1A公司是2级索引。有内置方法可以做到这一点吗?

最佳答案

如果需要的话set_indexsort_index ,使用:

df.set_index(['ID','Branch', 'Customer ID','Customer Name'], inplace=True)
df.sort_index(inplace=True)
print (df)
Balance
ID Branch Customer ID Customer Name
111111 Branch1 1 Company A 10
2 Company B 30
222222 Branch2 2 Company B 20
3 Company C 10

但是,如果 MultiIndex 中只需要两个级别(在我的解决方案中为 ab),则有必要首先将第二列和第三列连接起来第四列:

df['a'] = df.ID.astype(str) + '_' + df.Branch
df['b'] = df['Customer ID'].astype(str) + '_' + df['Customer Name']
#delete original columns
df.drop(['ID','Branch', 'Customer ID','Customer Name'], axis=1, inplace=True)

df.set_index(['a','b'], inplace=True)
df.sort_index(inplace=True)
print (df)
Balance
a b
111111_Branch1 1_Company A 10
2_Company B 30
222222_Branch2 2_Company B 20
3_Company C 10

如果需要通过前一列聚合最后一列,请使用 groupbyGroupBy.mean :

df = df.groupby(['ID','Branch', 'Customer ID','Customer Name'])['Balance'].mean().to_frame()
print (df)
Balance
ID Branch Customer ID Customer Name
111111 Branch1 1 Company A 10
2 Company B 30
222222 Branch2 2 Company B 20
3 Company C 10
<小时/>

如果在列中使用MultiIndex,则需要元组 set_index :

df.columns = pd.MultiIndex.from_arrays([['a'] * 2 + ['b']* 2 + ['c'], df.columns])
print (df)
a b c
ID Branch Customer ID Customer Name Balance
0 111111 Branch1 1 Company A 10
1 222222 Branch2 2 Company B 20
2 111111 Branch1 2 Company B 30
3 222222 Branch2 3 Company C 10

df.set_index([('a','ID'), ('a','Branch'),
('b','Customer ID'), ('b','Customer Name')], inplace=True)
df.sort_index(inplace=True)
print (df)
c
Balance
(a, ID) (a, Branch) (b, Customer ID) (b, Customer Name)
111111 Branch1 1 Company A 10
2 Company B 30
222222 Branch2 2 Company B 20
3 Company C 10

关于python - 将部分数据框转换为 Pandas 中的 MultiIndex,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39823852/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com