gpt4 book ai didi

python - ValueError : setting an array element with a sequence. 决策树

转载 作者:太空宇宙 更新时间:2023-11-03 14:14:17 28 4
gpt4 key购买 nike

我认为问题出在我的变量“info.venue”上。它实际上是字符串值,我使用 labelencoder 和 hotoneencoder 对其进行了编码。但当我尝试实现决策树时,它给了我错误。当我尝试只使用两个变量时,它就像一个魅力。但是当我使用一个 Hot 编码器使用“info.venue”时,它给了我以下错误。

错误是“值错误:使用序列设置数组元素”

info.toss.decision info.toss.winner  info.venue
field Australia Shere Bangla National Stadium
field Australia Adelaide Oval
field Australia Melbourne Cricket Ground
bat Australia Brabourne Stadium
bat Australia Melbourne Cricket Ground
bat Australia Sydney Cricket Ground
bat Australia Punjab Cricket Association
field India Kensington Oval, Bridgetown
field India Stadium Australia
field India Saurashtra Cricket Association Stadium
bat India Kingsmead
bat India Melbourne Cricket Ground
bat India R Premadasa Stadium

代码如下:

使用LabelEncoder和OneHotEncoder对数据进行编码

> from sklearn.preprocessing import LabelEncoder,OneHotEncoder
> labelencoder=LabelEncoder() onehotencoder=OneHotEncoder()
> df['info.toss.decision'] =
> labelencoder.fit_transform(df['info.toss.decision'])
> df['info.toss.winner']=
> labelencoder.fit_transform(df['info.toss.winner'])
> df['info.outcome.winner']=
> labelencoder.fit_transform(df['info.outcome.winner'])
> df['info.venue']=labelencoder.fit_transform(df['info.venue'])
> df['info.venue']=onehotencoder.fit_transform(df[['info.venue']])

从数据框中选择特定列

X = df[['info.venue','info.toss.decision','info.toss.winner']]
Y = df[['info.outcome.winner']]

将数据集拆分为训练集和测试集

from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.25)

将决策树分类拟合到训练集

from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier(criterion = 'gini', random_state = 0)
classifier.fit(X_train, y_train)

“info.venue”列如下;

info.venue

Kingsmead
Melbourne Cricket Ground
Brabourne Stadium
Kensington Oval, Bridgetown
Stadium Australia
Melbourne Cricket Ground
R Premadasa Stadium
Saurashtra Cricket Association Stadium
Shere Bangla National Stadium
Adelaide Oval
Melbourne Cricket Ground
Sydney Cricket Ground
Punjab Cricket Association IS Bindra Stadium, Mohali

最佳答案

此错误是因为您尝试将二维数组分配给 pandas 中的单个列。

OneHotEncoder 默认返回一个稀疏矩阵,它被 pandas 识别为一个对象数组。因此,pandas 会接受这一点并将完整的 2D 对象广播到数据帧的所有行。然后在决策树的拟合过程中会抛出错误。

所以你需要改变它:

ohe_data = onehotencoder.fit_transform(df[['info.venue']]).toarray()
for i in np.arange(onehotencoder.n_values_):
df['infovenue_one_coded_'+str(i)]=ohe_data[:,i]

然后从数据框中删除原始列:

new_df = df.drop('info.venue', 1)

然后将这个new_df传递给决策树。

更新:

由于您首先要转换为一个热编码数据,然后将其拆分为训练和测试,因此我建议使用 pd.get_dummies(),它将替换代码中的 LabelEncoder 和 OneHotEncoder。

替换这些行:

df['info.venue']=labelencoder.fit_transform(df['info.venue'])
df['info.venue']=onehotencoder.fit_transform(df[['info.venue']])

new_df = pd.concat([df, pd.get_dummies(df['info.venue'])], axis=1)
new_df = df.drop('info.venue', axis=1, inplace=True)

关于python - ValueError : setting an array element with a sequence. 决策树,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48305143/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com