gpt4 book ai didi

python - 如何使用 scikit 正确进行一种热编码?

转载 作者:行者123 更新时间:2023-11-30 09:52:07 25 4
gpt4 key购买 nike

我的功能之一是可以采用 29 种不同状态的分类变量。我正在尝试使用一种热编码来转换它,以便我可以使用此功能构建预测模型。以下是我的代码:

enc = preprocessing.OneHotEncoder()
enc.fit([[0], [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28]])

subredditCategory = []
if row[1] == 'Art':
subredditCategory.append(0)
elif row[1] == 'AskReddit':
subredditCategory.append(1)
elif row[1] == 'askscience':
subredditCategory.append(2)
elif row[1] == 'aww':
subredditCategory.append(3)
elif row[1] == 'books':
subredditCategory.append(4)
elif row[1] == 'creepy':
subredditCategory.append(5)
elif row[1] == 'dataisbeautiful':
subredditCategory.append(6)
elif row[1] == 'DIY':
subredditCategory.append(7)
elif row[1] == 'Documentaries':
subredditCategory.append(8)
elif row[1] == 'EarthPorn':
subredditCategory.append(9)
elif row[1] == 'explainlikeimfive':
subredditCategory.append(10)
elif row[1] == 'food':
subredditCategory.append(11)
elif row[1] == 'funny':
subredditCategory.append(12)
elif row[1] == 'gaming':
subredditCategory.append(13)
elif row[1] == 'gifs':
subredditCategory.append(14)
elif row[1] == 'history':
subredditCategory.append(15)
elif row[1] == 'jokes':
subredditCategory.append(16)
elif row[1] == 'LifeProTips':
subredditCategory.append(17)
elif row[1] == 'movies':
subredditCategory.append(18)
elif row[1] == 'music':
subredditCategory.append(19)
elif row[1] == 'pics':
subredditCategory.append(20)
elif row[1] == 'science':
subredditCategory.append(21)
elif row[1] == 'ShowerThoughts':
subredditCategory.append(22)
elif row[1] == 'space':
subredditCategory.append(23)
elif row[1] == 'sports':
subredditCategory.append(24)
elif row[1] == 'tifu':
subredditCategory.append(25)
elif row[1] == 'todayilearned':
subredditCategory.append(26)
elif row[1] == 'videos':
subredditCategory.append(27)
elif row[1] == 'worldnews':
subredditCategory.append(28)

sub = enc.transform([subredditCategory]).toarray()

features.append([row[2], row[3], row[6], row[8], sub])
labels.append(row[9])

但是当我尝试使用特征和标签来训练模型时,如下所示:

clf = tree.DecisionTreeClassifier()
clf = clf.fit(features, labels)

我收到以下运行时崩溃错误:

ValueError: setting an array element with a sequence.

这就是生成的 clf.fit 行。不确定我做错了什么 - 有什么想法吗?

最佳答案

我相信,当您拥有分类数据时,您还需要利用 LabelBinarizerLabelEncoder .

您可以按如下方式使用 LabelEncoder:

encoder = sklearn.preprocessing.OneHotEncoder()
label_encoder = sklearn.preprocessing.LabelEncoder()
data_labels_encoded = label_encoder.fit_transform(data['category_feature'])
data['category_feature'] = data_label_encoded
feature = encoder.fit_transform(data[['category_feature']].as_matrix())

您可以按如下方式使用 LabelBinarizer:

lb = preprocessing.LabelBinarizer()
feature = lb.fit_transform(data['category_feature'])

我觉得后者是更好的方法,但这可能是情况。

关于python - 如何使用 scikit 正确进行一种热编码?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43458331/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com