I am trying to generate a ROC curve for data that is highly imbalanced and multiclass (I know this is not ideal, it is requested by a reviewer for the paper).
SKlearn have an option for this here:
https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html
我正在尝试为高度不平衡和多类别的数据生成ROC曲线(我知道这不是理想的,这是本论文的审稿人要求的)。SKLEARY在这里有一个选项:https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html
The specific code I am using is this:
我使用的具体代码是:
RocCurveDisplay.from_predictions(
y_onehot_test.ravel(),
y_score.ravel(),
name="micro-average OvR",
color="darkorange",
plot_chance_level=True,
)
plt.axis("square")
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Micro-averaged One-vs-Rest\nReceiver Operating Characteristic")
plt.legend()
plt.show()
I am confused about the averaging: The title includes the information that we use "micro averaged ovr", but where do I actually give this information to the function?
我对平均化感到困惑:标题中包含了我们使用的信息“微平均OVR”,但我实际上应该在哪里将该信息提供给函数?
y_onehot_test looks like this:
1
1
1
0
0
...
Y_ONE_TEST看起来是这样的:1110 0...
and y_score looks like this:
0.783307
0.832748
0.619186
0.645178
0.654100
...
Y_SCORE如下:0.783307 0.832748 0.619186 0.645178 0.654100...
Thanks for any insights and explanations :)
感谢您的见解和解释:)
更多回答
It seems to me from that link that it's the .ravel()
that makes it micro-averaging. I don't think it's an "option" of the plot, but rather the form of y_test
and y_score
.
在我看来,从这个链接来看,.ravel()使其达到微平均值。我认为这不是情节的“选项”,而是y_test和y_core的形式。
Thanks Joe! After further reading I understand that micro-averaging gives each sample an equal weight, therefor there is no need for class information.
谢谢你,乔!在进一步阅读后,我了解到微观平均法赋予每个样本相同的权重,因此不需要类别信息。
优秀答案推荐
If anyone in the future has this same question - the answer is in understanding better micro-averaging.
Micro-average gives each sample equal weight, thus there is no need for class information in this case.
If you do want to give different weights by class size, weighted averaging is needed.
如果未来有人有同样的问题--答案是更好地理解微观平均。微均值赋予每个样本相同的权重,因此在这种情况下不需要类别信息。如果您确实想根据班级大小给出不同的权重,则需要加权平均。
更多回答
我是一名优秀的程序员,十分优秀!