gpt4 book ai didi

python - cross_val_score 返回的分数与我的交叉验证分数的自定义实现有何差异?

转载 作者:行者123 更新时间:2023-11-30 09:16:09 39 4
gpt4 key购买 nike

我实现了自定义cross_val_score函数。但结果与使用 sklearn 的 cross_val_score 获得的结果不同。

modelType = SGDClassifier(random_state=7)

cv2 = StratifiedKFold(5)

scores = cross_val_score(modelType, XTrainSc, yTrain, cv=cv2, scoring='accuracy', n_jobs=-1)
print(scores)


modelType = SGDClassifier(random_state=7)

ss=[]

for ti, vi in cv2.split(XTrainSc, yTrain):
print(str(len(ti))+" "+str(len(vi)))
model = clone(modelType)
model.fit(XTrainSc[ti], yTrain[ti])
preds = model.predict(XTrainSc[vi])
ss.append(np.mean(preds==yTrain[vi]))


print(ss)

这里scoresss不相等。我做错了什么吗?

最佳答案

StratifiedKfold 在决定每次折叠的索引时也有随机性部分。因此,设置 random_state 对于获得可重复性至关重要。

这是一个可重现的示例:

>>> from sklearn import datasets, linear_model
>>> from sklearn.model_selection import cross_val_score, StratifiedKFold
>>> from sklearn.base import clone
>>> import numpy as np
>>> X, y = datasets.load_breast_cancer(return_X_y=True)


model = linear_model.SGDClassifier(random_state=7)

cv2 = StratifiedKFold(5,random_state=0)

scores = cross_val_score(model, X, y, cv=cv2, scoring='accuracy', n_jobs=-1)
print(scores)


model = linear_model.SGDClassifier(random_state=7)

ss=[]

for ti, vi in cv2.split(X, y):
print(str(len(ti))+" "+str(len(vi)))
model = clone(model)
model.fit(X[ti], y[ti])
preds = model.predict(X[vi])
ss.append(np.mean(preds==y[vi]))


print(ss)

输出:

[0.91304348 0.70434783 0.45132743 0.38938053 0.38053097]
454 115
454 115
456 113
456 113
456 113
[0.9130434782608695, 0.7043478260869566, 0.45132743362831856, 0.3893805309734513, 0.3805309734513274]

关于python - cross_val_score 返回的分数与我的交叉验证分数的自定义实现有何差异?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55771594/

39 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com