gpt4 book ai didi

python - 对两个多索引数据帧使用fillna会引发InvalidIndexError

转载 作者:行者123 更新时间:2023-12-03 16:10:59 25 4
gpt4 key购买 nike

我有两个像这样的数据框:

import pandas as pd
import numpy as np


df1 = pd.DataFrame({
'key1': list('ABAACCA'),
'key2': list('1675987'),
'prop1': list('xyzuynb'),
'prop2': list('mnbbbas')
}).set_index(['key1', 'key2'])

df2 = pd.DataFrame({
'key1': list('ABCCADD'),
'key2': list('1598787'),
'prop1': [np.nan] * 7,
'prop2': [np.nan] * 7
}).set_index(['key1', 'key2'])

prop1 prop2
key1 key2
A 1 x m
B 6 y n
A 7 z b
5 u b
C 9 y b
8 n a
A 7 b s

prop1 prop2
key1 key2
A 1 NaN NaN
B 5 NaN NaN
C 9 NaN NaN
8 NaN NaN
A 7 NaN NaN
D 8 NaN NaN
7 NaN NaN
并且现在想使用 df1来填充 df2
df2.fillna(df1)
但是,我得到

site-packages/pandas/core/generic.py in _where(self, cond, other,inplace, axis, level, errors, try_cast) 8694
other._get_axis(i).equals(ax) for i, ax in enumerate(self.axes)
8695 ):-> 8696 raise InvalidIndexError 8697 8698 # slice me out of the other

InvalidIndexError:


我过去曾经成功地使用过这种方法,但我真的不明白为什么这种方法会失败。有什么想法使它起作用吗?
编辑
这是一个非常相似并且可以正常工作的示例:
filler1 = pd.DataFrame({
'key': list('AAABCCDD'),
'prop1': list('xyzuyasj'),
'prop2': list('mnbbbqwo')
})

tobefilled1 = pd.DataFrame({
'key': list('AAABBCACDF'),
'keep_me': ['stuff'] * 10,
'prop1': [np.nan] * 10,
'prop2': [np.nan] * 10,

})

filler1['g'] = filler1.groupby('key').cumcount()
tobefilled1['g'] = tobefilled1.groupby('key').cumcount()

filler1 = filler1.set_index(['key', 'g'])
tobefilled1 = tobefilled1.set_index(['key', 'g'])

print(tobefilled1.fillna(filler1))

prints

key g
A 0 stuff x m
1 stuff y n
2 stuff z b
B 0 stuff u b
1 stuff NaN NaN
C 0 stuff y b
A 3 stuff NaN NaN
C 1 stuff a q
D 0 stuff s w
F 0 stuff NaN NaN

最佳答案

这里的问题是在df1中定义的重复索引:

df1 = pd.DataFrame({
'key1': list('ABAACCA'),
'key2': list('1675987'),
'prop1': list('xyzuynb'),
'prop2': list('mnbbbas')
}).set_index(['key1', 'key2'])
注意:Key1 = A Key2 = 7出现两次,df1的索引不是唯一的。
让我们将第二个A7更改为A9
df1 = pd.DataFrame({
'key1': list('ABAACCA'),
'key2': list('1675989'),
'prop1': list('xyzuynb'),
'prop2': list('mnbbbas')
}).set_index(['key1', 'key2'])

df2 = pd.DataFrame({
'key1': list('ABCCADD'),
'key2': list('1598787'),
'prop1': [np.nan] * 7,
'prop2': [np.nan] * 7
}).set_index(['key1', 'key2'])
因此,在df1中创建唯一索引,现在尝试df.fillna:
df2.fillna(df1)
输出:
          prop1 prop2
key1 key2
A 1 x m
B 5 NaN NaN
C 9 y b
8 n a
A 7 z b
D 8 NaN NaN
7 NaN NaN
当我尝试使用 reindex_like方法时,首先得到了唯一的索引,从而得到了这一提示:
df1 = pd.DataFrame({
'key1': list('ABAACCA'),
'key2': list('1675989'),
'prop1': list('xyzuynb'),
'prop2': list('mnbbbas')
}).set_index(['key1', 'key2'])

df2 = pd.DataFrame({
'key1': list('ABCCADD'),
'key2': list('1598787'),
'prop1': [np.nan] * 7,
'prop2': [np.nan] * 7
}).set_index(['key1', 'key2'])
print(df1.reindex_like(df2))
输出:
          prop1 prop2
key1 key2
A 1 x m
B 5 NaN NaN
C 9 y b
8 n a
A 7 z b
D 8 NaN NaN
7 NaN NaN
现在,让我们恢复到帖子中的原始数据帧:
df1 = pd.DataFrame({
'key1': list('ABAACCA'),
'key2': list('1675987'),
'prop1': list('xyzuynb'),
'prop2': list('mnbbbas')
}).set_index(['key1', 'key2'])

df2 = pd.DataFrame({
'key1': list('ABCCADD'),
'key2': list('1598787'),
'prop1': [np.nan] * 7,
'prop2': [np.nan] * 7
}).set_index(['key1', 'key2'])
print(df1.reindex_like(df2))
输出ValueError:
ValueError: cannot handle a non-unique multi-index!
另一个解决方法是,通过添加另一个带有累加计数的索引级别来创建唯一索引。
df1 = pd.DataFrame({
'key1': list('ABAACCA'),
'key2': list('1675987'),
'prop1': list('xyzuynb'),
'prop2': list('mnbbbas')
}).set_index(['key1', 'key2'])

df2 = pd.DataFrame({
'key1': list('ABCCADD'),
'key2': list('1598787'),
'prop1': [np.nan] * 7,
'prop2': [np.nan] * 7
}).set_index(['key1', 'key2'])

df1 = df1.set_index(df1.groupby(df1.index).cumcount(), append=True)
df2 = df2.set_index(df2.groupby(df2.index).cumcount(), append=True)

df2.fillna(df1)
输出:
            prop1 prop2
key1 key2
A 1 0 x m
B 5 0 NaN NaN
C 9 0 y b
8 0 n a
A 7 0 z b
D 8 0 NaN NaN
7 0 NaN NaN
然后,您可以删除索引级别2:
df2.fillna(df1).reset_index(level=2, drop=True)
输出:
          prop1 prop2
key1 key2
A 1 x m
B 5 NaN NaN
C 9 y b
8 n a
A 7 z b
D 8 NaN NaN
7 NaN NaN
但是,我认为 Pandas 应该为 fillna非唯一MultiIndexes提供更好的错误消息传递,就像对 reindex_like一样。

关于python - 对两个多索引数据帧使用fillna会引发InvalidIndexError,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62789981/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com