gpt4 book ai didi

python - 没有 NAN 值的标签编码

转载 作者:行者123 更新时间:2023-12-01 08:00:46 25 4
gpt4 key购买 nike

我想对分类变量进行编码,而不对缺失值进行编码。目前,我找不到正确的解决方案,这是我的代码:


# To define my df :
df = pd.DataFrame({'A': ['X', np.NaN, 'Z'], 'B': ['DB', 'AB', 'CA'], 'C': ['KH', 1, np.NaN]})
df :

A B C
0 X DB KH
1 NaN AB 1
2 Z CA NaN
# To encoding juste A variable :
Le = preprocessing.LabelEncoder()
target = Le.fit_transform(df['A'].astype(str))

# but this method also encodes NAN values

# then I tried another handle but it does not work:

Le = preprocessing.LabelEncoder()

# define the values of A not null and try again labelencoding:

Anotnull = df.loc[df['A'] != np.nan]
target = Le.fit_transform(Anotnull.astype(str))

目标是在不触及 NaN 值的情况下进行标签编码

最佳答案

因此,从技术上讲,这不是“不接触 nan”的标签编码,但它会给您留下一个标签编码数据帧,其中 nan 位于其原始位置。

df_raw = pd.DataFrame({"feature1": ["a", "b", "c", np.nan, "e"],
"feature2": ["h", "i", np.nan, "k", "l"]})

# 1st possibility
df_temp = df_raw.astype("str").apply(LabelEncoder().fit_transform)
df_final = df_temp.where(~df_raw.isna(), df_raw)

# 2nd possibility
df_temp = df_raw.astype("category").apply(lambda x: x.cat.codes)
df_final = df_temp.where(~df_raw.isna(), df_raw)

关于python - 没有 NAN 值的标签编码,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55745402/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com