gpt4 book ai didi

machine-learning - 为了使训练集产生不偏斜的分类器,正例和负例的比例应该是多少?

转载 作者:行者123 更新时间:2023-11-30 09:56:27 25 4
gpt4 key购买 nike

我的训练数据集包含一个类别的 46071 个示例和另一个类别的 33606 个示例。这会导致分类器倾斜吗?我正在使用 SVM,但不想使用 SVM 的选项来处理倾斜数据。

最佳答案

如果分类类别的代表性不大致相同,那么数据集就会出现偏差(我认为没有精确的值)。

您的数据集并不是高度不平衡的。无论如何,它可能会引入对大多数(可能无趣)类别的偏见,尤其是使用准确性来评估分类器。

可以通过多种方式管理倾斜的训练集。两种常用的方法是:

值得注意的是(来自 Issue on Learning from Imbalanced Data Sets ):

in certain domains (e.g. fraud detection) the class imbalance is intrinsic to the problem: there are typically very few cases of fraud as compared to the large number of honest use of the facilities.

However, class imbalances sometimes occur in domains that do not have an intrinsic imbalance.

This will happen when the data collection process is limited (e.g. due to economic or privacy reasons), thus creating articial imbalances.

Conversely, in certain cases, the data abounds and it is for the scientist to decide which examples to select and in what quantity.

In addition, there can also be an imbalance in costs of making different errors, which could vary per case.

所以这一切都取决于您的数据,真的!

更多详细信息:

关于machine-learning - 为了使训练集产生不偏斜的分类器,正例和负例的比例应该是多少?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26156503/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com