gpt4 book ai didi

python - 我的所有机器学习模型都获得了 100% 的准确率。我的模型有什么问题

转载 作者:太空宇宙 更新时间:2023-11-03 19:41:27 25 4
gpt4 key购买 nike

我正在研究一个由 5 个手工字母组成的数据集。我已将数据库上传到 Kaggle 上,如果有人想看一下,请这样做。

https://www.kaggle.com/shayanriyaz/gesture-recognition

目前,我已经训练和测试了多个模型,但始终保持 100% 的准确率。

这是我的代码。

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import matplotlib.pyplot as plt
# importing alll the necessary packages to use the various classification algorithms
from sklearn.linear_model import LogisticRegression # for Logistic Regression algorithm
from sklearn.model_selection import train_test_split #to split the dataset for training and testing
from sklearn.neighbors import KNeighborsClassifier # for K nearest neighbours
from sklearn import svm #for Support Vector Machine (SVM) Algorithm
from sklearn import metrics #for checking the model accuracy
from sklearn.tree import DecisionTreeClassifier #for using Decision Tree Algoithm
from mpl_toolkits.mplot3d import Axes3D
import os # accessing directory structure

from subprocess import check_output

df = df.drop(['Id','Time', 'Wrist_Pitch','Wrist_Roll'],axis = 1)
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))

nRowsRead = None

df = pd.read_csv('/kaggle/input/ASL_DATA.csv', delimiter=',', nrows = nRowsRead)

df.dataframeName = 'ASL_DATA.csv'
nRow, nCol = df.shape

print(f'There are {nRow} rows and {nCol} columns')

plt.figure(figsize=(30,20))
sns.heatmap(df.corr(),annot=True,cmap='cubehelix_r') #draws heatmap with input as the correlation matrix calculted by(iris.corr())
plt.show()

train, test = train_test_split(df, test_size = 0.2)# in this our main data is split into train and test
# the attribute test_size=0.3 splits the data into 70% and 30% ratio. train=70% and test=30%
print(train.shape)
print(test.shape)

train_X = train[['Thumb_Pitch','Thumb_Roll','Index_Pitch','Index_Roll','Middle_Pitch','Middle_Roll','Ring_Pitch','Ring_Roll','Pinky_Pitch','Pinky_Roll']]# taking the training data features
train_y=train.Letter# output of our training data
test_X= test[['Thumb_Pitch','Thumb_Roll','Index_Pitch','Index_Roll','Middle_Pitch','Middle_Roll','Ring_Pitch','Ring_Roll','Pinky_Pitch','Pinky_Roll']] # taking test data features
test_y =test.Letter #output value of test data

from sklearn import preprocessing
mm_scaler = preprocessing.RobustScaler()
train_X = mm_scaler.fit_transform(train_X)
test_X = mm_scaler.transform(test_X)


model=DecisionTreeClassifier()
model.fit(train_X,train_y)
prediction=model.predict(test_X)
print('The accuracy of the Decision Tree is',metrics.accuracy_score(prediction,test_y))


model=DecisionTreeClassifier()
model.fit(train_X,train_y)
prediction=model.predict(test_X)
print('The accuracy of the Decision Tree is',metrics.accuracy_score(prediction,test_y))

model=KNeighborsClassifier(n_neighbors=) #this examines 3 neighbours for putting the new data into a class
model.fit(train_X,train_y)
prediction=model.predict(test_X)
print('The accuracy of the KNN is',metrics.accuracy_score(prediction,test_y))

最佳答案

你的模型没有任何问题,这只是模型需要解决的一个小问题。当您考虑到您所拥有的所有功能时,这些字母看起来毫无相似之处。如果您选择了所有字母或看起来都相同的字母,您可能会看到一些错误。

仅使用index_pitch和index_roll重新运行模型。您仍然会获得 95% 的 AUC。至少通过这样做,你可以猜测唯一的损失来自 B、D 和 K,通过查看它们的图像,如果你只看食指,它们是唯一可能被混淆的 3 个损失。事实证明确实如此。

这只是一个问题,只要你的数据集实际上是可以解决的

关于python - 我的所有机器学习模型都获得了 100% 的准确率。我的模型有什么问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60401653/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com