gpt4 book ai didi

python - 在多索引 pandas 数据框中添加丢失的索引

转载 作者:行者123 更新时间:2023-12-01 07:51:25 26 4
gpt4 key购买 nike

嗨,我有多索引 pandas 数据框。抱歉没有图片,但我发现它比纯代码更容易解​​释

enter image description here

由于数据不一致,我的一些行缺少Parent_category。在示例数据中,Parent_category 是空白区域。

为了获取您在图片上看到的数据框,我按 Child_category 对数据进行分组。

如何为具有相同 Child_category 的行填充缺失的 Parent_category 字段?

索引结构:

MultiIndex(levels=[['Apps', 'Bars', 'Bath', 'Beer', 'Books', 'Breakfast', 'Cellar', 'Charity', 'Cleaning', 'Clothing', 'Co-working', 'Coffee', 'Dining', 'Drugs', 'Education', 'Electronics', 'Entertainment', 'Groceries', 'Hair Cut', 'Hotel', 'Icecream', 'Lunch', 'Maintenance', 'Massage', 'Museums', 'Music', 'Parking', 'Petroleum', 'Rent', 'Repair', 'Resident', 'Snacks', 'Souvenir', 'Souvenirs', 'Spa & yoga', 'Taxi', 'Tea', 'Transport', 'Traveling', 'Visa', 'Yoga', 'Канцелярия'], ['', 'Car', 'Drinks', 'Eatings', 'Home', 'Spa & yoga', 'Transport', 'Traveling', 'Utilities', 'iTunes']],
codes=[[0, 1, 1, 2, 3, 3, 4, 5, 5, 6, 6, 7, 8, 9, 10, 11, 11, 12, 12, 13, 14, 15, 16, 17, 18, 19, 20, 20, 21, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 36, 37, 37, 38, 39, 40, 41], [9, 0, 2, 4, 0, 2, 0, 0, 3, 0, 8, 0, 1, 0, 0, 0, 2, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 3, 4, 5, 7, 9, 1, 1, 1, 1, 4, 0, 7, 0, 0, 0, 0, 2, 0, 6, 0, 0, 5, 0]],
names=['Child_category', 'Parent_category'],
sortorder=0)

重新索引后,我得到以下数据框。我猜想使用 O(n^2) 可以在循环内填充数据,但需要寻找优雅的解决方案。

enter image description here

最佳答案

我相信你需要:

mux = pd.MultiIndex(levels=[['Apps', 'Bars', 'Bath', 'Beer', 'Books', 'Breakfast', 'Cellar', 'Charity', 'Cleaning', 'Clothing', 'Co-working', 'Coffee', 'Dining', 'Drugs', 'Education', 'Electronics', 'Entertainment', 'Groceries', 'Hair Cut', 'Hotel', 'Icecream', 'Lunch', 'Maintenance', 'Massage', 'Museums', 'Music', 'Parking', 'Petroleum', 'Rent', 'Repair', 'Resident', 'Snacks', 'Souvenir', 'Souvenirs', 'Spa & yoga', 'Taxi', 'Tea', 'Transport', 'Traveling', 'Visa', 'Yoga', 'Канцелярия'], ['', 'Car', 'Drinks', 'Eatings', 'Home', 'Spa & yoga', 'Transport', 'Traveling', 'Utilities', 'iTunes']],
codes=[[0, 1, 1, 2, 3, 3, 4, 5, 5, 6, 6, 7, 8, 9, 10, 11, 11, 12, 12, 13, 14, 15, 16, 17, 18, 19, 20, 20, 21, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 36, 37, 37, 38, 39, 40, 41], [9, 0, 2, 4, 0, 2, 0, 0, 3, 0, 8, 0, 1, 0, 0, 0, 2, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 3, 4, 5, 7, 9, 1, 1, 1, 1, 4, 0, 7, 0, 0, 0, 0, 2, 0, 6, 0, 0, 5, 0]],
names=['Child_category', 'Parent_category'],
sortorder=0)
df = pd.DataFrame({'a': range(52)}, index=mux)

对于每个 Child_category 级别,获取第一个非空空间值:

print (df.rename({'':np.nan}, level=1)
.reset_index()
.groupby('Child_category')
.first()
.set_index('Parent_category', append=True)
.head(20))

或者用 Child_category 每组的值 Parent_category 替换空格:

print (df.rename({'':np.nan}, level=1)
.reset_index()
.groupby('Child_category')
.apply(lambda x: x.ffill().bfill())
.set_index(['Child_category', 'Parent_category'])
.head(20))

关于python - 在多索引 pandas 数据框中添加丢失的索引,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56197158/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com