gpt4 book ai didi

Python:如何用中位数逐列替换缺失值

转载 作者:行者123 更新时间:2023-12-01 08:17:56 28 4
gpt4 key购买 nike

我有一个数据框如下

df = pd.DataFrame({'A': [1, 2, 3], 'B': [1.45, 2.33, np.nan], 'C': [4, 5, 6], 'D': [4.55, 7.36, np.nan]}) 

我想以通用方式替换缺失值,即np.nan。为此,我创建了一个函数,如下所示

def treat_mis_value_nu(df):
df_nu = df.select_dtypes(include=['number'])
lst_null_col = df_nu.columns[df_nu.isnull().any()].tolist()
if len(lst_null_col)>0:
for i in lst_null_col:
if df_nu[i].isnull().sum()/len(df_nu[i])>0.10:
df_final_nu = df_nu.drop([i],axis=1)
else:
df_final_nu = df_nu[i].fillna(df_nu[i].median(),inplace=True)
return df_final_nu

当我按如下方式应用此功能时

df_final = treat_mis_value_nu(df)

我得到的数据框如下

    A    B  C
0 1 1.0 4
1 2 2.0 5
2 3 NaN 6

因此它实际上正确删除了D列,但未能删除B列。我知道过去有人讨论过这个话题(here)。我仍然可能会遗漏一些东西吗?

最佳答案

用途:

df = pd.DataFrame({'A': [1, 2, 3,5,7], 'B': [1.45, 2.33, np.nan, np.nan, np.nan], 
'C': [4, 5, 6,8,7], 'D': [4.55, 7.36, np.nan,9,10],
'E':list('abcde')})
print (df)
A B C D E
0 1 1.45 4 4.55 a
1 2 2.33 5 7.36 b
2 3 NaN 6 NaN c
3 5 NaN 8 9.00 d
4 7 NaN 7 10.00 e

def treat_mis_value_nu(df):
#get only numeric columns to dataframe
df_nu = df.select_dtypes(include=['number'])
#get only columns with NaNs
df_nu = df_nu.loc[:, df_nu.isnull().any()]
#get columns for remove with mean instead sum/len, it is same
cols_to_drop = df_nu.columns[df_nu.isnull().mean() <= 0.30]
#replace missing values of original columns and remove above thresh
return df.fillna(df_nu.median()).drop(cols_to_drop, axis=1)

print (treat_mis_value_nu(df))
A C D E
0 1 4 4.55 a
1 2 5 7.36 b
2 3 6 8.18 c
3 5 8 9.00 d
4 7 7 10.00 e

关于Python:如何用中位数逐列替换缺失值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54879621/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com