gpt4 book ai didi

python - train_test_split random_state 不起作用;每次都会产生不同的输出

转载 作者:太空宇宙 更新时间:2023-11-03 20:19:40 30 4
gpt4 key购买 nike

所以,我一直在一组数据上使用 KNN,在 train_test_split 阶段使用 random_state = 4。尽管使用随机状态,但每次输出的准确率、分类报告、预测等都是不同的。想知道为什么会这样吗?

这是数据的头部:(根据all_time_runs和顺序预测位置)

order position  all_time_runs
0 10 NO BAT 1304
1 2 CAN BAT 7396
2 3 NO BAT 6938
3 6 CAN BAT 4903
4 6 CAN BAT 3761

这是分类和预测的代码:

#splitting data into features and target

X = posdf.drop('position',axis=1)
y = posdf['position']


knn = KNeighborsClassifier(n_neighbors = 5)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 42)

#fitting the KNN model
knn.fit(X_train, y_train)

#predicting with the model
prediction = knn.predict(X_test)

#knn score
score = knn.score(X_test, y_test)

最佳答案

虽然train_test_split有一个与之相关的随机因素,并且必须解决它以避免产生随机结果,但这并不是您应该解决的唯一因素。

KNN 是一个模型,它采用测试集的每一行,找到最近的 k 个训练集向量,并通过多数决策对其进行分类,即使在平局的情况下,决策也是随机的。您需要set.seed(x)以确保该方法是可复制的。

Documentation状态:

Neighbors-based classification is a type of instance-based learning or non-generalizing learning: it does not attempt to construct a general internal model, but simply stores instances of the training data. Classification is computed from a simple majority vote of the nearest neighbors of each point: a query point is assigned the data class which has the most representatives within the nearest neighbors of the point.

关于python - train_test_split random_state 不起作用;每次都会产生不同的输出,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58241767/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com