gpt4 book ai didi

python - 如何删除 % NAN 高于某个数字的 float 功能?

转载 作者:太空宇宙 更新时间:2023-11-03 21:08:51 24 4
gpt4 key购买 nike

我正在尝试删除一个功能,该功能如果 float 且缺失值的数量高于特定数量。

我已经尝试过:

# Define threshold to 1/6
threshold = 0.1667

# Drop float > threshold
for f in data:
if data[f].dtype==float & data[f].isnull().sum() / data.shape[0] > threshold: del data[f]

..这会引发错误:

TypeError: unsupported operand type(s) for &: 'type' and 'numpy.float64'

如果有帮助,我们将不胜感激。

最佳答案

使用 DataFrame.select_dtypes 对于仅 float 列,检查缺失值并获取 mean -sum/count并通过 Series.reindex 添加另一个非 float 列,最后按inverse过滤状况><=通过 boolean indexing :

np.random.seed(2019)
df = pd.DataFrame(np.random.choice([np.nan,1], p=(0.2,0.8),size=(10,10))).assign(A='a')
print (df)
0 1 2 3 4 5 6 7 8 9 A
0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 a
1 1.0 1.0 NaN 1.0 NaN 1.0 NaN 1.0 1.0 1.0 a
2 1.0 1.0 1.0 1.0 1.0 NaN 1.0 NaN 1.0 1.0 a
3 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 NaN 1.0 a
4 1.0 NaN 1.0 1.0 1.0 1.0 1.0 NaN 1.0 1.0 a
5 1.0 1.0 1.0 1.0 1.0 1.0 NaN 1.0 1.0 1.0 a
6 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 a
7 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 a
8 1.0 NaN 1.0 1.0 1.0 1.0 NaN 1.0 1.0 1.0 a
9 NaN 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 NaN a

threshold = 0.1667
df1 = df.select_dtypes(float).isnull().mean().reindex(df.columns, fill_value=False)
df = df.loc[:, df1 <= threshold]
print (df)
0 2 3 4 5 8 9 A
0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 a
1 1.0 NaN 1.0 NaN 1.0 1.0 1.0 a
2 1.0 1.0 1.0 1.0 NaN 1.0 1.0 a
3 1.0 1.0 1.0 1.0 1.0 NaN 1.0 a
4 1.0 1.0 1.0 1.0 1.0 1.0 1.0 a
5 1.0 1.0 1.0 1.0 1.0 1.0 1.0 a
6 1.0 1.0 1.0 1.0 1.0 1.0 1.0 a
7 1.0 1.0 1.0 1.0 1.0 1.0 1.0 a
8 1.0 1.0 1.0 1.0 1.0 1.0 1.0 a
9 NaN 1.0 1.0 1.0 1.0 1.0 NaN a

关于python - 如何删除 % NAN 高于某个数字的 float 功能?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55200846/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com