gpt4 book ai didi

python - 替换缺失值和不稳定值,Pythons

转载 作者:行者123 更新时间:2023-11-28 18:32:11 25 4
gpt4 key购买 nike

有以下例子:

 import pandas as pd
df = pd.DataFrame({ 'Column A' : ['null',20,30,40,'null'],'Column B' : [100,'null',30,50,'null']});

The link for the example

我需要一个 Python 函数,它接受两列并比较它们:

  1. 如果一列是缺失值,我们会从另一列填充它。

  2. 如果两个值都是“NULL”,我们保留“NULL”。

  3. 如果值不同(不一致),请将两个值都替换为“NULL”

  4. 返回一个属性

运行该函数后数据应如下所示。 the link for the result

这是我到目前为止所做的,我需要帮助来实现第 3 步

def myFunction(firAttribute,secAttribute):
x=df.ix[:,[firAttribute,secAttribute]]
x['new']=x[firAttribute].fillna(x[secAttribute])
x['new2']=x[secAttribute].fillna(x[firAttribute])
x['new'] =x['new'].fillna(x['new2'])
return x['new']

最佳答案

可以先replace nullNaN,然后是 combine_first NaN 列与上次使用之间 boolean indexing用于匹配不同的列值并填充它们 NaN:

import pandas as pd
import numpy as np

df = pd.DataFrame({ 'Column A' : ['null',20,30,40,'null'],
'Column B' : [100,'null',30,50,'null']});
print df
Column A Column B
0 null 100
1 20 null
2 30 30
3 40 50
4 null null

#replace null to NaN
df = df.replace("null", np.nan)
print df
Column A Column B
0 NaN 100
1 20 NaN
2 30 30
3 40 50
4 NaN NaN
df['Column A'] = df['Column A'].combine_first(df['Column B'])
df['Column B'] = df['Column B'].combine_first(df['Column A'])
print df
Column A Column B
0 100 100
1 20 20
2 30 30
3 40 50
4 NaN NaN

#inconsistent values replace to NaN
df[df['Column A'] != df['Column B']] = np.nan
print df
Column A Column B
0 100 100
1 20 20
2 30 30
3 NaN NaN
4 NaN NaN

关于python - 替换缺失值和不稳定值,Pythons,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35894994/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com