gpt4 book ai didi

python - 跨多个 pandas 数据帧设置 nans

转载 作者:太空狗 更新时间:2023-10-30 01:53:20 25 4
gpt4 key购买 nike

我有许多类似的数据帧,我想在所有数据帧中标准化 nans。例如,如果 nan 存在于 df1.loc[0,'a'] 中,那么对于相同的索引位置,所有其他数据帧都应设置为 nan。

我知道我可以将数据框分组以创建一个大型多索引数据框,但有时我发现使用一组相同结构的数据框更容易。

这是一个例子:

import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.reshape(np.arange(12), (4,3)), columns=['a', 'b', 'c'])
df2 = pd.DataFrame(np.reshape(np.arange(12), (4,3)), columns=['a', 'b', 'c'])
df3 = pd.DataFrame(np.reshape(np.arange(12), (4,3)), columns=['a', 'b', 'c'])

df1.loc[3,'a'] = np.nan
df2.loc[1,'b'] = np.nan
df3.loc[0,'c'] = np.nan

print df1
print ' '
print df2
print ' '
print df3

输出:

     a   b   c
0 0.0 1 2
1 3.0 4 5
2 6.0 7 8
3 NaN 10 11

a b c
0 0 1.0 2
1 3 NaN 5
2 6 7.0 8
3 9 10.0 11

a b c
0 0 1 NaN
1 3 4 5.0
2 6 7 8.0
3 9 10 11.0

但是,我希望 df1、df2 和 df3 在相同位置有 nan:

print df1
a b c
0 0.0 1.0 NaN
1 3.0 NaN 5.0
2 6.0 7.0 8.0
3 NaN 10.0 11.0

使用 piRSquared 提供的答案,我能够将其扩展为不同大小的数据帧。这是函数:

def set_nans_over_every_df(df_list):
# Find unique index and column values
complete_index = sorted(set([idx for df in df_list for idx in df.index]))
complete_columns = sorted(set([idx for df in df_list for idx in df.columns]))

# Ensure that every df has the same indexes and columns
df_list = [df.reindex(index=complete_index, columns=complete_columns) for df in df_list]

# Find the nans in each df and set nans in every other df at the same location
mask = np.isnan(np.stack([df.values for df in df_list])).any(0)
df_list = [df.mask(mask) for df in df_list]

return df_list

以及使用不同大小的数据框的示例:

df1 = pd.DataFrame(np.reshape(np.arange(15), (5,3)), index=[0,1,2,3,4], columns=['a', 'b', 'c'])
df2 = pd.DataFrame(np.reshape(np.arange(12), (4,3)), index=[0,1,2,3], columns=['a', 'b', 'c'])
df3 = pd.DataFrame(np.reshape(np.arange(16), (4,4)), index=[0,1,2,3], columns=['a', 'b', 'c', 'd'])

df1.loc[3,'a'] = np.nan
df2.loc[1,'b'] = np.nan
df3.loc[0,'c'] = np.nan

df1, df2, df3 = set_nans_over_every_df([df1, df2, df3])

print df1

a b c d
0 0.0 1.0 NaN NaN
1 3.0 NaN 5.0 NaN
2 6.0 7.0 8.0 NaN
3 NaN 10.0 11.0 NaN
4 NaN NaN NaN NaN

最佳答案

我会在 numpy 中设置一个 mask 然后在 pd.DataFrame.mask 中使用这个 mask > 方法

mask = np.isnan(np.stack([d.values for d in [df1, df2, df3]])).any(0)

print(df1.mask(mask))

a b c
0 0.0 1.0 NaN
1 3.0 NaN 5.0
2 6.0 7.0 8.0
3 NaN 10.0 11.0

print(df2.mask(mask))

a b c
0 0.0 1.0 NaN
1 3.0 NaN 5.0
2 6.0 7.0 8.0
3 NaN 10.0 11.0

print(df3.mask(mask))

a b c
0 0.0 1.0 NaN
1 3.0 NaN 5.0
2 6.0 7.0 8.0
3 NaN 10.0 11.0

关于python - 跨多个 pandas 数据帧设置 nans,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41703618/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com