gpt4 book ai didi

python - Pandas:将列添加到多索引以实现任意深度的索引级别

转载 作者:行者123 更新时间:2023-12-01 00:41:35 26 4
gpt4 key购买 nike

我想将缺少级别(索引 = 1)的列添加到数据帧的每个父级别(索引 = 0)。对于一个简单的数据框来说,这非常有效

index = [['A', 'B', 'C', 'D'], ['a', 'b', 'a', 'b']]
cols = [['AC', 'AC', 'BC', 'DC', 'CC'], ['ac', 'aac', 'bc', 'ac', 'bc']]
data = np.random.random((4, 5))
df = pd.DataFrame(data=data, index=index, columns=cols)
df.columns.names = ['col_name_0', 'col_name_1']

数据框:

col_name_0        AC                  BC        DC        CC
col_name_1 ac aac bc ac bc
A a 0.169402 0.899434 0.644941 0.330402 0.805702
B b 0.933743 0.994497 0.060507 0.609129 0.545999
C a 0.064937 0.686350 0.740594 0.985218 0.717699
D b 0.151031 0.932294 0.948751 0.538251 0.085700

处理步骤:

feature_index = [index for index, item in enumerate(df.columns.names) if item == 'col_name_1'][0]
all_features = df.columns.levels[feature_index].to_list()

for idx, item in df.groupby(level=0, axis=1):
features = item.columns.get_level_values(1).to_list()
missing = list(set(all_features) - set(features))
for m_item in missing:
df[idx, m_item] = np.nan * np.ones(df.shape[0])

处理后的df:

col_name_0        AC                BC      ...  CC            DC              
col_name_1 aac ac bc aac ac ... ac bc aac ac bc
A a 0.561247 0.353270 NaN NaN NaN ... NaN 0.733714 NaN 0.343174 NaN
B b 0.699053 0.696892 NaN NaN NaN ... NaN 0.144768 NaN 0.267141 NaN
C a 0.624581 0.064629 NaN NaN NaN ... NaN 0.856559 NaN 0.772735 NaN
D b 0.563903 0.192823 NaN NaN NaN ... NaN 0.071497 NaN 0.000361 NaN

但是对于具有多个列级别的数据框(如下所示),该方法会失败:

index = [['A', 'B', 'C', 'D'], ['a', 'b', 'a', 'b']]
cols = [['AC', 'AC', 'BC', 'DC', 'CC'], ['ac', 'aac', 'bc', 'ac', 'bc'], ['Xc', 'Xc', 'Xc', 'Xc', 'Xc']]
data = np.random.random((4, 5))
df = pd.DataFrame(data=data, index=index, columns=cols)
df.columns.names = ['col_name_0', 'col_name_1', 'col_name_2']

原始数据框:

col_name_0        AC                  BC        DC        CC
col_name_1 ac aac bc ac bc
col_name_2 Xc Xc Xc Xc Xc
A a 0.317022 0.700635 0.305712 0.934382 0.315501
B b 0.601277 0.726890 0.737907 0.571935 0.716260
C a 0.679046 0.314987 0.846560 0.962516 0.770071
D b 0.124029 0.626421 0.967531 0.193875 0.395897

处理步骤:

feature_index = [index for index, item in enumerate(df.columns.names) if item == 'col_name_1'][0]
all_features = df.columns.levels[feature_index].to_list()

for idx, item in df.groupby(level=0, axis=1):
features = item.columns.get_level_values(1).to_list()
missing = list(set(all_features) - set(features))
for m_item in missing:
df[idx, m_item] = np.nan * np.ones(df.shape[0])

错误信息:

ValueError:项目的长度必须等于级别数。

有什么想法可以让我的方法更通用以接受任何列级别吗?

最佳答案

所以你可以只使用stackunstack

out = df.stack(level = 1).unstack().swaplevel(1, 2, axis = 1)

关于python - Pandas:将列添加到多索引以实现任意深度的索引级别,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57310930/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com