gpt4 book ai didi

python - 重新索引不完整的多级数据框中的第二级以使其完整,在缺失的行上插入 NAN

转载 作者:太空宇宙 更新时间:2023-11-03 15:06:42 25 4
gpt4 key购买 nike

我需要重新索引 pandas 数据框的第 2 级,以便第 2 级成为每个第 1 级索引的(完整)列表 0,...,(N-1)

  • 我尝试使用 Allan/Hayden's approach , 但不幸的是,它只创建了一个索引,其中包含与以前存在的行一样多的行。
  • 我想要的是为每个新索引插入新行(具有 nan 值)。

例子:

df = pd.DataFrame({
'first': ['one', 'one', 'one', 'two', 'two', 'three'],
'second': [0, 1, 2, 0, 1, 1],
'value': [1, 2, 3, 4, 5, 6]
})
print df

first second value
0 one 0 1
1 one 1 2
2 one 2 3
3 two 0 4
4 two 1 5
5 three 1 6

# Tried using Allan/Hayden's approach, but no good for this, doesn't add the missing rows
df['second'] = df.reset_index().groupby(['first']).cumcount()
print df
first second value
0 one 0 1
1 one 1 2
2 one 2 3
3 two 0 4
4 two 1 5
5 three 0 6

我想要的结果是:

   first  second  value
0 one 0 1
1 one 1 2
2 one 2 3
3 two 0 4
4 two 1 5
4 two 2 nan <-- INSERTED
5 three 0 6
5 three 1 nan <-- INSERTED
5 three 2 nan <-- INSERTED

最佳答案

我认为你可以先将列firstsecond设置为多级索引,然后再reindex

# your data
# ==========================
df = pd.DataFrame({
'first': ['one', 'one', 'one', 'two', 'two', 'three'],
'second': [0, 1, 2, 0, 1, 1],
'value': [1, 2, 3, 4, 5, 6]
})

df

first second value
0 one 0 1
1 one 1 2
2 one 2 3
3 two 0 4
4 two 1 5
5 three 1 6

# processing
# ============================
multi_index = pd.MultiIndex.from_product([df['first'].unique(), np.arange(3)], names=['first', 'second'])

df.set_index(['first', 'second']).reindex(multi_index).reset_index()

first second value
0 one 0 1
1 one 1 2
2 one 2 3
3 two 0 4
4 two 1 5
5 two 2 NaN
6 three 0 NaN
7 three 1 6
8 three 2 NaN

关于python - 重新索引不完整的多级数据框中的第二级以使其完整,在缺失的行上插入 NAN,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31901821/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com