gpt4 book ai didi

python - 根据行条目替换 pandas DataFrame 中的 NaN

转载 作者:太空宇宙 更新时间:2023-11-03 14:31:38 25 4
gpt4 key购买 nike

我有一个 DataFrame,其中每行代表一次医生的就诊,每列包含来自一次诊断测试的数据。数据不完整,缺失值用 NaN 填充。

这是一个简化的示例:

       AGE Height     SEX Weight
0 79 40 Male 90
1 79 21 Male 20
2 79 NaN Male 50
3 79 89 Male NaN
4 79 90 Male 57
5 81 87 Female NaN
6 81 NaN Female 89
7 81 54 Female 79
8 81 21 Female NaN
9 81 23 Female 23

我想将每个 NaN 替换为相同性别和年龄患者的总体平均值。我已经能够创建一个 DataFrame,其中包含每个 AGE 和 SEX 组合的方法,如下所示:

age_sex_means = df.groupby(['SEX', 'AGE'])['Height','Weight'].mean()

这会产生以下数据帧:

                Height  Weight
SEX AGE
Female 81 37.0 38.2
Male 79 48.0 43.4

但是我找不到用第二个 DataFrame 中包含的方法替换第一个 DataFrame 中的 NaN 的方法。两者Using Pandas to fill NaN entries based on values in a different column, using a dictionary as a guide似乎解决了与我类似的情况,但只有一个索引显然不适用于我的具体情况。

最佳答案

选项 1
您可以将 applyfillna

结合使用
df.groupby(['AGE', 'SEX'], group_keys=False).apply(lambda x: x.fillna(x.mean()))

AGE Height SEX Weight
0 79 40.00 Male 90.000000
1 79 21.00 Male 20.000000
2 79 60.00 Male 50.000000
3 79 89.00 Male 54.250000
4 79 90.00 Male 57.000000
5 81 87.00 Female 63.666667
6 81 46.25 Female 89.000000
7 81 54.00 Female 79.000000
8 81 21.00 Female 63.666667
9 81 23.00 Female 23.000000
<小时/>

选项 2
使用 transformcombine_first 生成副本

df.combine_first(df.groupby(['SEX', 'AGE']).transform('mean'))

AGE Height SEX Weight
0 79 40.00 Male 90.000000
1 79 21.00 Male 20.000000
2 79 60.00 Male 50.000000
3 79 89.00 Male 54.250000
4 79 90.00 Male 57.000000
5 81 87.00 Female 63.666667
6 81 46.25 Female 89.000000
7 81 54.00 Female 79.000000
8 81 21.00 Female 63.666667
9 81 23.00 Female 23.000000
<小时/>

选项 3
fillna

相同
df.fillna(df.groupby(['SEX', 'AGE']).transform('mean'))

AGE Height SEX Weight
0 79 40.00 Male 90.000000
1 79 21.00 Male 20.000000
2 79 60.00 Male 50.000000
3 79 89.00 Male 54.250000
4 79 90.00 Male 57.000000
5 81 87.00 Female 63.666667
6 81 46.25 Female 89.000000
7 81 54.00 Female 79.000000
8 81 21.00 Female 63.666667
9 81 23.00 Female 23.000000
<小时/>

选项 4
或者使用 update 就地编辑

df.update(df.groupby(['SEX', 'AGE']).transform('mean'))
df

AGE Height SEX Weight
0 79 40.00 Male 90.000000
1 79 21.00 Male 20.000000
2 79 60.00 Male 50.000000
3 79 89.00 Male 54.250000
4 79 90.00 Male 57.000000
5 81 87.00 Female 63.666667
6 81 46.25 Female 89.000000
7 81 54.00 Female 79.000000
8 81 21.00 Female 63.666667
9 81 23.00 Female 23.000000

关于python - 根据行条目替换 pandas DataFrame 中的 NaN,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47244021/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com