gpt4 book ai didi

python - 值错误 : could not convert string to float: med

转载 作者:太空宇宙 更新时间:2023-11-04 08:38:45 28 4
gpt4 key购买 nike

我正在编写一个非常简单的脚本。我所要做的就是使用 Pandas 读取数据,然后根据数据训练决策树。我使用的数据是:

https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data

下面是我的脚本

import numpy as np
from sklearn.cross_validation import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn import tree
from sklearn import preprocessing
import pandas as pd
balance_data=pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data",
sep= ',', header= None)
#print "Dataset:: "

#df1.head()

X = balance_data.values[:, 0:5]
Y = balance_data.values[:,6]
X_train, X_test, y_train, y_test = train_test_split( X, Y, test_size = 0.2, random_state = 100)
clf_gini = DecisionTreeClassifier(criterion = "gini", random_state = 100,
max_depth=3, min_samples_leaf=5)

clf_gini.fit(X_train, y_train)

根据错误,我猜测它无法将“med”属性值转换为 float 。通过查看数据,我随机猜测 low 前面有一个空格,而 med 没有。这就是它变得困惑的原因。但我不确定。请告诉它可能有什么问题。PS:错误发生在最后一行,这里是回溯

ValueError                                Traceback (most recent call last)
<ipython-input-26-b495e5f26174> in <module>()
18 max_depth=3, min_samples_leaf=5)
19 X_train[X_train != '']
---> 20 clf_gini.fit(X_train, y_train)

/home/fatima/anaconda2/lib/python2.7/site-packages/sklearn/tree/tree.pyc in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
788 sample_weight=sample_weight,
789 check_input=check_input,
--> 790 X_idx_sorted=X_idx_sorted)
791 return self
792

/home/fatima/anaconda2/lib/python2.7/site-packages/sklearn/tree/tree.pyc in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
114 random_state = check_random_state(self.random_state)
115 if check_input:
--> 116 X = check_array(X, dtype=DTYPE, accept_sparse="csc")
117 y = check_array(y, ensure_2d=False, dtype=None)
118 if issparse(X):

/home/fatima/anaconda2/lib/python2.7/site-packages/sklearn/utils/validation.pyc in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
400 force_all_finite)
401 else:
--> 402 array = np.array(array, dtype=dtype, order=order, copy=copy)
403
404 if ensure_2d:

ValueError: could not convert string to float: med

最佳答案

数据集如下所示:

       0      1  2  3      4     5      6
0 vhigh vhigh 2 2 small low unacc
1 vhigh vhigh 2 2 small med unacc
2 vhigh vhigh 2 2 small high unacc
3 vhigh vhigh 2 2 med low unacc
4 vhigh vhigh 2 2 med med unacc

其中数据类型(dtypes)都是对象。然而,机器学习算法只能从数字(int、float、doubles ..)中学习,因此,您需要在将数据用于训练之前对其进行编码。

有几种方法可以对您的数据进行编码,一种方法是使用标签编码,为此,在加载数据集后将以下行添加到您的代码中:

le = preprocessing.LabelEncoder()
balance_data = balance_data.apply(le.fit_transform)

现在 balance_data 中的数据如下所示:

   0  1  2  3  4  5  6
0 3 3 0 0 2 1 2
1 3 3 0 0 2 2 2
2 3 3 0 0 2 0 2
3 3 3 0 0 1 1 2
4 3 3 0 0 1 2 2

其中所有数据类型都是 int。

通常,您需要在训练/拟合模型之前执行一些数据预处理。为此,我建议您通过一些教程来了解该过程。例如,检查这个:


这是修复后的整体代码:

import numpy as np
from sklearn.cross_validation import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn import tree
from sklearn import preprocessing
import pandas as pd
balance_data=pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data",
sep= ',', header= None)
#print "Dataset:: "

#df1.head()

le = preprocessing.LabelEncoder()
balance_data = balance_data.apply(le.fit_transform)

X = balance_data.values[:, 0:5]
Y = balance_data.values[:,6]
X_train, X_test, y_train, y_test = train_test_split( X, Y, test_size = 0.2, random_state = 100)
clf_gini = DecisionTreeClassifier(criterion = "gini", random_state = 100,
max_depth=3, min_samples_leaf=5)

clf_gini.fit(X_train, y_train)

关于python - 值错误 : could not convert string to float: med,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46500357/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com