python - 为什么 roc_curve 会为某些类别的阈值 (2.0) 返回一个附加值？-6ren

python - 为什么 roc_curve 会为某些类别的阈值 (2.0) 返回一个附加值？

转载作者：太空宇宙更新时间：2023-11-04 00:11:37

25

4

我正在使用 python 3.5.2 和 sklearn 0.19.1

我有一个多类问题(3 个类)，我正在使用 RandomForestClassifier。对于我拥有的一个 cass19 个独特的 predict_proba 值:

{0.0,
0.6666666666666666,
0.6736189855024448,
0.6773290780865037,
0.7150826826468751,
0.7175236925236925,
0.7775446850962057,
0.8245648135911781,
0.8631035080004867,
0.8720525244880196,
0.8739595855873906,
0.8787152225755167,
0.9289844333343654,
0.954439314892936,
0.9606503912532541,
0.9771342285323964,
0.9883370916703461,
0.9957401423931763,
1.0}

我正在计算 roc_curve 并且我期望 roc 曲线的点数与我具有唯一概率值的点数相同。这仅适用于 3 个类中的 2 个!

当我查看返回的阈值时，roc_curve 函数:

fpr, tpr, proba = roc_curve(....):

我看到与概率列表中的值完全相同 + 一个新值 2.0!

[2.,
1.,
0.99574014,
0.98833709,
0.97713423,
0.96065039,
0.95443931,
0.92898443,
0.87871522,
0.87395959,
0.87205252,
0.86310351,
0.82456481,
0.77754469,
0.71752369,
0.71508268,
0.67732908,
0.67361899,
0.66666667,
0. ]

为什么返回新的阈值 2.0？我在文档中没有看到任何与此相关的内容。

有什么想法吗？我错过了一些东西

最佳答案

roc_curve 的编写使得最高阈值 (fpr[0], tpr[0]) 对应的 ROC 点始终为 ( 0, 0).如果不是这种情况，将创建一个新的阈值，其值为 max(y_score)+1 的任意值。相关代码来自the source :

thresholds : array, shape = [n_thresholds]
    Decreasing thresholds on the decision function used to compute
    fpr and tpr. `thresholds[0]` represents no instances being predicted
    and is arbitrarily set to `max(y_score) + 1`.

和

if tps.size == 0 or fps[0] != 0:
    # Add an extra threshold position if necessary
    tps = np.r_[0, tps]
    fps = np.r_[0, fps]
    thresholds = np.r_[thresholds[0] + 1, thresholds]

所以在您展示的情况下，您的数据得分为 1.0 的数据似乎分类不正确。

关于python - 为什么 roc_curve 会为某些类别的阈值 (2.0) 返回一个附加值？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/52358114/

25

4

0

文章推荐： python - 为覆盖所有二维矩阵的三个非重叠掩码生成

文章推荐： css - 用 css 在网格上排列图像，以一种简单且不吸吮的方式

文章推荐： HTML5 导航菜单文本

文章推荐： python - 使用输入按钮处理网站上的分页

python - 多标签分类中的 roc_curve 有斜率
我有一个用 Keras 编写的多标签分类器，我想从中计算 AUC 并为从测试集中分类的每个元素绘制 ROC 曲线。一切看起来都很好，除了一些元素有一条斜率如下的 roc 曲线: 我不知道在这种情况下
python - 我如何重新采样 "roc_curve"(fpr,tpr)？
我希望对“roc_curve”(sklearn) 输出重新采样。当我在 Ipython 中绘制 fpr,tpr 时很好，但有时我想导出它(主要是为客户端)，但很难理解，因为它不是线性的。例如 fp
python - scikit 学习中 roc_curve 中的阈值
我指的是下面的链接和示例，并在我感到困惑的地方张贴此页面的绘图。我的困惑是，只有 4 个阈值，但 roc 曲线似乎有很多数据点(> 4 个数据点)，想知道 roc_curve 如何在底层工作以找到更多
python - 值错误 : Data is not binary and pos_label is not specified for roc_curve
我正在尝试计算 roc_curve 但我收到此错误消息 Traceback (most recent call last): File "script.py", line 94, in
python - 为什么 roc_curve 会为某些类别的阈值 (2.0) 返回一个附加值？
我正在使用 python 3.5.2 和 sklearn 0.19.1 我有一个多类问题(3 个类)，我正在使用 RandomForestClassifier。对于我拥有的一个 cass19 个独特的
scikit-learn - 在传递到 'roc_curve' 之前如何处理从 'auc' 返回的 NaN？
我正在使用 scikit-learn 中度量模型中的“roc_curve”。该示例显示 'roc_curve' 应在 'auc' 之前调用，类似于: fpr, tpr, thresholds = me
python - 如何解释 scikit-learn 中的 roc_curve(Test,Predictions)
我正在使用 scikit-learn 来解决分类问题，以预测机会的获胜或损失。我使用了这段代码: fpr, tpr, thresholds =roc_curve(yTest,predictions)

首页

博学

6Ren·AI

商城

python - 为什么 roc_curve 会为某些类别的阈值 (2.0) 返回一个附加值？