gpt4 book ai didi

pandas - pd.Categorical.from_codes 缺少值

转载 作者:行者123 更新时间:2023-12-03 04:22:19 25 4
gpt4 key购买 nike

假设我有:

df = pd.DataFrame({'gender': np.random.choice([1, 2], 10), 'height': np.random.randint(150, 210, 10)})

我想将性别列分类。如果我尝试:

df['gender'] = pd.Categorical.from_codes(df['gender'], ['female', 'male'])

它会失败。

我可以填充类别

df['gender'] = pd.Categorical.from_codes(df['gender'], ['N/A', 'female', 'male'])

但是在某些方法中会返回'N/A':

In [67]: df['gender'].value_counts()
Out[67]:
female 5
male 5
N/A 0
Name: gender, dtype: int64

我考虑过使用None作为填充值。它按 value_counts 中的预期工作,但我收到警告:

opt/anaconda3/bin/ipython:1: FutureWarning: 
Setting NaNs in `categories` is deprecated and will be removed in a future version of pandas.
#!/opt/anaconda3/bin/python

有更好的方法吗?还有一种方法可以显式地给出从代码到类别的映射?

最佳答案

您可以使用rename_categories()方法:

演示:

In [33]: df
Out[33]:
gender height
0 1 203
1 2 169
2 2 181
3 1 172
4 2 174
5 1 166
6 2 187
7 2 200
8 1 208
9 1 201

In [34]: df['gender'] = df['gender'].astype('category').cat.rename_categories(['male','feemale'])

In [35]: df
Out[35]:
gender height
0 male 203
1 feemale 169
2 feemale 181
3 male 172
4 feemale 174
5 male 166
6 feemale 187
7 feemale 200
8 male 208
9 male 201

In [36]: df.dtypes
Out[36]:
gender category
height int32
dtype: object

关于pandas - pd.Categorical.from_codes 缺少值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41779575/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com