gpt4 book ai didi

python - 训练模型并有利于查全率/查准率的最佳方法是什么?

转载 作者:行者123 更新时间:2023-12-01 01:47:17 27 4
gpt4 key购买 nike

我有一个二元分类问题,我的数据集由 5% 的正标签组成。我正在使用 tensorflow 训练我的模型。这是我训练期间的结果:

Step 3819999: loss = 0.22 (0.004 sec)
Accuracy = 0.955; Recall = 0.011; Precision = 0.496

Step 3820999: loss = 0.21 (0.003 sec)
Accuracy = 0.955; Recall = 0.011; Precision = 0.496

Step 3821999: loss = 0.15 (0.003 sec)
Accuracy = 0.955; Recall = 0.011; Precision = 0.496

Step 3822999: loss = 0.15 (0.003 sec)
Accuracy = 0.955; Recall = 0.011; Precision = 0.496

提高召回率的主要策略是什么?更改数据集并添加更多正面标签可能会解决问题,但改变问题的现实似乎很奇怪......

在我看来,应该有一种方法可以支持“真阳性”而不是“假阴性”,但我似乎找不到。

最佳答案

您应该使用“weighted cross entropy ”而不是经典的 CE。来自 Tensorflow 文档:

This is like sigmoid_cross_entropy_with_logits() except that pos_weight, allows one to trade off recall and precision by up- or down-weighting the cost of a positive error relative to a negative error. The usual cross-entropy cost is defined as:

targets * -log(sigmoid(logits)) + (1 - targets) * -log(1 - sigmoid(logits))

A value pos_weights > 1 decreases the false negative count, hence increasing the recall. Conversely setting pos_weights < 1 decreases the false positive count and increases the precision. This can be seen from the fact that pos_weight is introduced as a multiplicative coefficient for the positive targets term in the loss expression:

targets * -log(sigmoid(logits)) * pos_weight + (1 - targets) * -log(1 - sigmoid(logits))

关于python - 训练模型并有利于查全率/查准率的最佳方法是什么?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51157904/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com