python - 在 sklearn 中使用留一法交叉验证的 ROC 曲线-6ren

python - 在 sklearn 中使用留一法交叉验证的 ROC 曲线

转载作者：太空宇宙更新时间：2023-11-04 04:03:44

我想使用留一法交叉验证绘制分类器的ROC 曲线。

好像有人问过类似的问题here但没有任何回答。

在另一个问题中here是这样说的:

In order to obtain a meaningful ROC AUC with LeaveOneOut, you need to calculate probability estimates for each fold (each consisting of just one observation), then calculate the ROC AUC on the set of all these probability estimates.

此外，在 scikit-learn 官方网站上有一个类似的示例，但使用了 KFold 交叉验证(https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc_crossval.html#sphx-glr-auto-examples-model-selection-plot-roc-crossval-py)。

所以对于留一法交叉验证案例，我正在考虑收集测试集(一次一个样本)的所有概率预测，并在获得所有预测概率之后我的褶皱，计算和绘制 ROC 曲线。

这样可以吗？我看不到任何其他方法可以实现我的目标。

这是我的代码:

from sklearn.svm import SVC
import numpy as np, matplotlib.pyplot as plt,  pandas as pd
from sklearn.model_selection import cross_val_score,cross_val_predict,  KFold,  LeaveOneOut, StratifiedKFold
from sklearn.metrics import roc_curve, auc
from sklearn import datasets

# Import some data to play with
iris = datasets.load_iris()
X_svc = iris.data
y = iris.target
X_svc, y = X_svc[y != 2], y[y != 2]

clf = SVC(kernel='linear', class_weight='balanced', probability=True, random_state=0)
kf = LeaveOneOut()

all_y = []
all_probs=[]
for train, test in kf.split(X_svc, y):
    all_y.append(y[test])
    all_probs.append(clf.fit(X_svc[train], y[train]).predict_proba(X_svc[test])[:,1])
all_y = np.array(all_y)
all_probs = np.array(all_probs)

fpr, tpr, thresholds = roc_curve(all_y,all_probs)
roc_auc = auc(fpr, tpr)
plt.figure(1, figsize=(12,6))
plt.plot(fpr, tpr, lw=2, alpha=0.5, label='LOOCV ROC (AUC = %0.2f)' % (roc_auc))
plt.plot([0, 1], [0, 1], linestyle='--', lw=2, color='k', label='Chance level', alpha=.8)
plt.xlim([-0.05, 1.05])
plt.ylim([-0.05, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic example')
plt.legend(loc="lower right")
plt.grid()
plt.show()

最佳答案

我相信代码是正确的，拆分也是正确的。为了验证实现和结果，我添加了几行:

from sklearn.model_selection import cross_val_score,cross_val_predict,  KFold,  LeaveOneOut, StratifiedKFold
from sklearn.metrics import roc_curve, auc
from sklearn import datasets

# Import some data to play with
iris = datasets.load_iris()
X_svc = iris.data
y = iris.target
X_svc, y = X_svc[y != 2], y[y != 2]

clf = SVC(kernel='linear', class_weight='balanced', probability=True, random_state=0)
kf = LeaveOneOut()
if kf.get_n_splits(X_svc) == len(X_svc):
    print("They are the same length, splitting correct")
else:
    print("Something is wrong")
all_y = []
all_probs=[]
for train, test in kf.split(X_svc, y):
    all_y.append(y[test])
    all_probs.append(clf.fit(X_svc[train], y[train]).predict_proba(X_svc[test])[:,1])
all_y = np.array(all_y)
all_probs = np.array(all_probs)
#print(all_y) #For validation 
#print(all_probs) #For validation

fpr, tpr, thresholds = roc_curve(all_y,all_probs)
print(fpr, tpr, thresholds) #For validation
roc_auc = auc(fpr, tpr)
plt.figure(1, figsize=(12,6))
plt.plot(fpr, tpr, lw=2, alpha=0.5, label='LOOCV ROC (AUC = %0.2f)' % (roc_auc))
plt.plot([0, 1], [0, 1], linestyle='--', lw=2, color='k', label='Chance level', alpha=.8)
plt.xlim([-0.05, 1.05])
plt.ylim([-0.05, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic example')
plt.legend(loc="lower right")
plt.grid()
plt.show()

If 行仅用于确保进行拆分 n 次，其中 n 是观察的次数给定的数据集。这是因为如文档所述，LeaveOneOut 与 Kfold(n_splits=n) 和 LeaveOneOut(p=1) 的工作方式相同。此外，在打印预测概率值时，它们也很好，使曲线有意义。恭喜您获得 1.00AUC!

关于python - 在 sklearn 中使用留一法交叉验证的 ROC 曲线，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57756804/

文章推荐： python - 如何在向后兼容的代码中使用 GLib.io_add_watch ？

文章推荐： linux - Ubuntu Linux 服务器不允许我向上滚动/翻页

javascript - 怎么能逼 Jade 留 mustache ？
这是我的jade雕像: section#entry-review-template.template(data-class='entry-review') table thead

太空宇宙

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 在 sklearn 中使用留一法交叉验证的 ROC 曲线