gpt4 book ai didi

python - 在 for/if-else 循环中填充 np.nan 条件

转载 作者:行者123 更新时间:2023-12-03 21:33:51 25 4
gpt4 key购买 nike

我已经为此工作了一段时间,但似乎找不到我需要的答案。假设我有如下数据框。

我想做的是根据 df['home_work'] 列中的值填充 df['gender'] 的最后三行,特别是如果 home_work > 9,则 m,如果不是,则 f。请记住,这只是一个编造的数据集,我保证没有冒犯任何人的意思!

enr = pd.DataFrame({'name_id':[1254, 1359, 1254, 1296, 1353, 2656], 
'enrollment_term':['spring 2018', 'spring 2018', 'fall 2018', 'spring 2018', 'spring 2018', 'fall 2020'],
'gpa_term': [2.93, np.nan, 1.65, 4.00, 3.95, 2.92],
'dog_owner':[0,1,1,1, 1, 0],
'salary':[50657, 90658, np.nan, 104352, np.nan, 102043],
'home_work':[34, np.nan, 12, 9, 8, 27],
'gender':['m','f','f',np.nan, np.nan, np.nan]})

enr

下面是我尝试的代码,但它在下面显示了错误:

for i in df['gender'].isna():
if df['home_work'][i] > 9:
df['gender'][i].fillna('m')
else:
df['gender'][i].fillna('f')
KeyError: False

任何帮助都会非常感激,因为我已经为此工作了一段时间。我有一个 90K + 的数据集,我想将这项工作适应并创建一个函数来简化这个过程,但遇到了减速带!

我遇到的问题是np.nan默认,如果不符合要求,就给gender填一个值。想法?


# 已编辑

假设我有以下 df:

enr = pd.DataFrame({'name_id':[1254, 1359, 1254, 1296, 1353, 2656], 
'enrollment_term':['spring 2018', 'spring 2018', 'fall 2018', 'spring 2018', 'spring 2018', 'fall 2020'],
'gpa_term': [2.93, np.nan, 1.65, 4.00, 3.95, 2.92],
'dog_owner':[0,1,1,1, 1, 0],
'salary':[50657, 90658, np.nan, 104352, np.nan, 102043],
'home_work':[np.nan, np.nan, 0.7, 0.3, 0.64, 0.49],
'gender':[0, 1, 1,np.nan, np.nan, np.nan]})

enter image description here

我想根据 home_work 估算 enr['gender']。如果 enr['home_work'] >= 0.5,则 enr['gender'] == 0,否则 (只要 enr[' home_work'] != np.nan), enr['gender'] == 1

我不想要的是 enr[gender] 中的值插补,其中它们的 enr['home_work']np.nan我尝试过许多不同的技术,但似乎都归咎于 1。想法?

最佳答案

使用numpy.whereSeries.fillna :

enr['gender'] = np.where(enr['home_work'] > 9,  
enr['gender'].fillna('m'),
enr['gender'].fillna('f'))

或者分别过滤2个掩码:

m = enr['gender'].isna()
enr.loc[m, 'gender'] = np.where(enr['home_work'] > 9, 'm', 'f')[m]

print (enr)
name_id enrollment_term gpa_term dog_owner salary home_work gender
0 1254 spring 2018 2.93 0 50657.0 34 m
1 1359 spring 2018 NaN 1 90658.0 42 f
2 1254 fall 2018 1.65 1 NaN 12 f
3 1296 spring 2018 4.00 1 104352.0 9 f
4 1353 spring 2018 3.95 1 NaN 8 f
5 2656 fall 2020 2.92 0 102043.0 27 m

编辑:

m = enr['gender'].isna() & enr['home_work'].notna()
enr.loc[m, 'gender'] = np.where(enr['home_work'] >= 0.5, 0, 1)[m]
print (enr)
name_id enrollment_term gpa_term dog_owner salary home_work gender
0 1254 spring 2018 2.93 0 50657.0 NaN 0.0
1 1359 spring 2018 NaN 1 90658.0 NaN 1.0
2 1254 fall 2018 1.65 1 NaN 0.70 1.0
3 1296 spring 2018 4.00 1 104352.0 0.30 1.0
4 1353 spring 2018 3.95 1 NaN 0.64 0.0
5 2656 fall 2020 2.92 0 102043.0 0.49 1.0

关于python - 在 for/if-else 循环中填充 np.nan 条件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60471025/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com