gpt4 book ai didi

python - 使 DictVectorizer 将数值识别为标称值

转载 作者:行者123 更新时间:2023-12-01 05:19:01 27 4
gpt4 key购买 nike

我有一个数据集,其中包含学生的毕业年份作为属性。当然,这样的属性是名义上的。但是 scikit-learn 中的 DictVectorizer 会将 1988 这样的值转换为数字。如何让 DictVectorizer 将其视为标称?

最佳答案

您可以将年份值指定为字符串,例如 {'year': '1998'} 而不是 {'year': 1998},根据DictVectorizer 的文档:

When feature values are strings, this transformer will do a binary one-hot (aka one-of-K) coding: one boolean-valued feature is constructed for each of the possible string values that the feature can take on. For instance, a feature “f” that can take on the values “ham” and “spam” will become two features in the output, one signifying “f=ham”, the other “f=spam”.

一个例子:

from sklearn.feature_extraction import DictVectorizer

d_numerical = [{'year': 1997},
{'year': 1998},
{'year': 1999}]
print DictVectorizer().fit_transform(d_numerical).toarray()

d_categorical = [{'year': '1997'},
{'year': '1998'},
{'year': '1999'}]
print DictVectorizer().fit_transform(d_categorical).toarray()

输出:

[[ 1997.]
[ 1998.]
[ 1999.]]
[[ 1. 0. 0.]
[ 0. 1. 0.]
[ 0. 0. 1.]]

第二种情况似乎就是您想要的。

关于python - 使 DictVectorizer 将数值识别为标称值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22737145/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com