gpt4 book ai didi

python - labelencoder 和 OneHotEncoder 的值错误

转载 作者:行者123 更新时间:2023-12-01 02:49:20 25 4
gpt4 key购买 nike

我正在尝试将分类字符串列转换为几个虚拟变量二进制列,但出现值错误。

代码如下:

import sys, os
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from dateutil import parser
import math
import traceback
import logging
datasetMod = pd.read_csv('data.csv')

X = datasetMod.iloc[:, 3:6].values
y = datasetMod.iloc[:, 1].values
print(X[:, 0])

# Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
try:
labelencoder_X = LabelEncoder()
X[:, 0] = labelencoder_X.fit_transform(X[:, 0])
onehotencoder = OneHotEncoder(categorical_features = [0])
X = onehotencoder.fit_transform(X).toarray()
except Exception as e:
exc_type, exc_obj, exc_tb = sys.exc_info()
fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]
print(exc_type, fname, exc_tb.tb_lineno)

错误如下:

<class 'ValueError'> multipleLinearRegression.py 23

该列的打印语句的结果是:

['Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday'
'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday'
'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday'
'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday'
'Workday' 'Workday' 'Weekend' 'Workday' 'Workday' 'Weekend' 'Weekend'
'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend'
'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend'
'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend'
'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend'
'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend'
'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend'
'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend'
'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend'
'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend'
'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend'
'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend'
'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend'
'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend'
'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend'
'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend'
'Weekend' 'Weekend' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday'
'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday'
'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday'
'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday'
'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday'
'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday'
'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday'
'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday'
'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday'
'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday'
'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday'
'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday'
'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday'
'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday'
'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday'
'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday'
'Workday' 'Workday' 'Workday' 'Workday' 'Weekend' 'Weekend' 'Weekend'
'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend'
'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend'
'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend'
'Weekend' 'Weekend' 'Weekend' 'Weekend']

字符串本身似乎没有任何问题,中间没有空格,也没有数字之类的符号。所以我不明白为什么我收到 valuetype can't conversion string to float 错误。

任何帮助将不胜感激。

更新

onehotencoder 现在工作得不错,但最终结果是 object 类型,而它应该是 float64 类型:

labelencoder_X = LabelEncoder()
X[:, 1] = labelencoder_X.fit_transform(X[:, 1])
X[:, 2] = labelencoder_X.fit_transform(X[:, 2])
X[:, 3] = labelencoder_X.fit_transform(X[:, 3])
onehotencoder = OneHotEncoder(categorical_features = [1,2,3])
onehotencoder.fit(X[:, 1])
onehotencoder.fit(X[:, 2])
onehotencoder.fit(X[:, 3])
onehotencoder.transform(X[:, 1])
onehotencoder.transform(X[:, 2])
onehotencoder.transform(X[:, 3])
X = onehotencoder.toArray()

更新2

from sklearn.preprocessing import LabelEncoder, OneHotEncoder

labelencoder_X = LabelEncoder()
X[:, 1] = labelencoder_X.fit_transform(X[:, 1])
X[:, 2] = labelencoder_X.fit_transform(X[:, 2])
X[:, 3] = labelencoder_X.fit_transform(X[:, 3])

onehotencoder = OneHotEncoder(categorical_features = [1,2,3])
X[:, 1] = onehotencoder.fit_transform(X[:, 1]).toarray()
X[:, 2] = onehotencoder.fit_transform(X[:, 2]).toarray()
X[:, 3] = onehotencoder.fit_transform(X[:, 3]).toarray()

print(X.dtype) #object

最终代码

由于categorical_features已经规定了索引,我可以在整个矩阵X上进行fit_transform()。感谢@mkos 的耐心!

from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X = LabelEncoder()
X[:, 1] = labelencoder_X.fit_transform(X[:, 1])
X[:, 2] = labelencoder_X.fit_transform(X[:, 2])
X[:, 3] = labelencoder_X.fit_transform(X[:, 3])
onehotencoder = OneHotEncoder(categorical_features = [1,2,3])
X = onehotencoder.fit_transform(X)

最佳答案

这应该可以解决问题:

onehotencoder = OneHotEncoder(categorical_features = [1,2,3])
X = onehotencoder.fit_transform(X)

您可以使用以下方式打印它:

print(X.toArray())

X 作为稀疏矩阵也不错,因为它可以节省内存。如果您想查看它,请使用 toArray() 将其转换为常规 np.array

关于python - labelencoder 和 OneHotEncoder 的值错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44955384/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com