gpt4 book ai didi

python-3.x - 为什么使用 sklearn 库随机生成的数据精度较低

转载 作者:行者123 更新时间:2023-11-30 08:41:58 24 4
gpt4 key购买 nike

我生成了一个正态分布样本以及 3 个类来执行分类。我得到的准确率非常低。我想知道您是否可以向我提供宝贵的反馈意见,以提高我的 LDA 分类器的性能。我很感激你的时间。这是我的代码:

import pandas as pd
import numpy as np
from random import seed
import random
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.model_selection import cross_val_score
from sklearn.metrics import f1_score
from sklearn.metrics import accuracy_score
import time

seed(23)
mu, sigma = 0, 0.1 # mean and standard deviation
x1 = np.random.normal(mu, sigma, 1000)
x1=x1.reshape(-1, 1)

seed(1)
y=np.random.randint(0,3,size=(1000,1))
y_cross=np.ravel(y)
start_time1 = time.time()
clf_s=LinearDiscriminantAnalysis()
print('5-fold cross-validation accuracy score:', np.mean(cross_val_score(clf_s,x1, np.ravel(y), cv=5,scoring='accuracy')))
print('5-fold cross-validation F1 score:', np.mean(cross_val_score(clf_s, x1, np.ravel(y), cv=5,scoring='f1_micro')))
end_time1 = time.time()
print ("Computational time in seconds = " +str(end_time1 - start_time1) )

结果:

5-fold cross-validation accuracy score: 0.3280613765344133
5-fold cross-validation F1 score: 0.3280613765344133
Computational time in seconds = 1.4167194366455078

最佳答案

.33 3 个类别的准确度意味着纯粹的猜测。我认为这是预期的,因为您生成的标签是随机的。该算法应该揭示数据中的结构。您准备数据的方式意味着您的算法没有可供学习的结构。如果您想要更高的精度,请正确生成数据,例如使用 sklearn.datasets.make_blobs并在该数据集上训练您的算法。

证明

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.model_selection import cross_val_score
from sklearn.datasets import make_blobs

X,y = make_blobs(n_samples=1000, n_features=2, centers=3, random_state=42)

clf=LinearDiscriminantAnalysis()
np.mean(cross_val_score(clf,X,y, cv=5,scoring='accuracy'))
1.0

关于python-3.x - 为什么使用 sklearn 库随机生成的数据精度较低,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60139880/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com