gpt4 book ai didi

python - 如何使用panda.read_csv从python中的csv文件导入数据?

转载 作者:行者123 更新时间:2023-12-01 07:05:36 24 4
gpt4 key购买 nike

enter image description here我正在尝试使用 scikit_learn 和 pandas 解决 python 中的决策树问题。该数据集以 CSV 文件形式提供。当我尝试在 python 中加载数据时,出现错误“ValueError: 无法将字符串转换为 float :'CustomerID'”。我不知道我在代码中做错了什么。

import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics
col_names=['CustomerID','Gender','Car Type', 'Shirt Size','Class']
pima=pd.read_csv("F:\Current semster courses\Machine
Learning\ML_A1_Fall2019\Q2_dataset.csv",header=None, names=col_names)
pima.head()
feature_cols=['CustomerID','Gender','Car Type', 'Shirt Size']
X=pima[feature_cols]
y=pima.Class
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)
clf = DecisionTreeClassifier()

# Train Decision Tree Classifer
clf = clf.fit(X_train,y_train)

#Predict the response for test dataset
y_pred = clf.predict(X_test)
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))

有人可以告诉我我做错了什么吗?

数据集:

CustomerID  Gender  Car Type    Shirt Size  Class
1 M Family Small C0
2 M Sports Medium C0
3 M Sports Medium C0
4 M Sports Large C0
5 M Sports Extra Large C0
6 M Sports Extra Large C0
7 F Sports Small C0
8 F Sports Small C0
9 F Sports Medium C0
10 F Luxury Large C0
11 M Family Large C1
12 M Family Extra Large C1
13 M Family Medium C1
14 M Luxury Extra Large C1
15 F Luxury Small C1
16 F Luxury Small C1
17 F Luxury Medium C1
18 F Luxury Medium C1
19 F Luxury Medium C1
20 F Luxury Large C1

最佳答案

啊。好的。问题是您的数据是分类数据,scikit 无法直接使用。首先需要将其转换为数值数据。 ._get_dummies() 方法通过获取具有多个分类值的单列,并将其转换为多列来实现此目的,每列包含一个数字 1 或 0,指示哪个类别是否为“True”。

顺便说一句,您应该从功能中删除“客户 ID”列。它是一个随机值,与该行属于一个类还是另一个类无关。

import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics

col_names=['CustomerID','Gender','Car Type', 'Shirt Size','Class']
data = [['1', 'M', 'Family', 'Small', 'C0'],
['2', 'M', 'Sports', 'Medium', 'C0'],
['3', 'M', 'Sports', 'Medium', 'C0'],
['4', 'M', 'Sports', 'Large', 'C0'],
['5', 'M', 'Sports', 'Extra Large','C0'],
['6', 'M', 'Sports', 'Extra Large','C0'],
['7', 'F', 'Sports', 'Small', 'C0'],
['8', 'F', 'Sports', 'Small', 'C0'],
['9', 'F', 'Sports', 'Medium', 'C0'],
['10', 'F', 'Luxury', 'Large', 'C0'],
['11', 'M', 'Family', 'Large', 'C1'],
['12', 'M', 'Family', 'Extra Large','C1'],
['13', 'M', 'Family', 'Medium', 'C1'],
['14', 'M', 'Luxury', 'Extra Large','C1'],
['15', 'F', 'Luxury', 'Small', 'C1']]

#pima=pd.read_csv("F:\Current semster courses\Machine ...
pima=pd.DataFrame(data, columns = col_names)
# Convert the categorical data to multiple columns of numerical data for the decision tree
pima = pd.get_dummies(pima, prefix=['CustomerID','Gender','Car Type', 'Shirt Size','Class'])
print(pima)

#feature_cols=['CustomerID','Gender','Car Type','Shirt Size']
feature_cols=['Gender_F', 'Gender_M',
'Car Type_Family', 'Car Type_Luxury', 'Car Type_Sports',
'Shirt Size_Extra Large', 'Shirt Size_Large', 'Shirt Size_Medium',
'Shirt Size_Small', 'Class_C0', 'Class_C1']
X=pima[feature_cols]
y=pima[['Class_C0', 'Class_C1']]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)

print("X_train =", X_train)
print("X_test =", X_test)
print("y_train =", y_train)
print("y_test =", y_test )
clf = DecisionTreeClassifier()

# Train Decision Tree Classifer
clf = clf.fit(X_train,y_train)

#Predict the response for test dataset
y_pred = clf.predict(X_test)
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))

关于python - 如何使用panda.read_csv从python中的csv文件导入数据?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58464750/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com