gpt4 book ai didi

machine-learning - 从高度倾斜的数据集进行预测

转载 作者:行者123 更新时间:2023-11-30 09:29:57 25 4
gpt4 key购买 nike

我想找到导致特定事件发生的因素。然而,该事件发生的概率仅为 1% 左右。因此,如果我有一个名为 event_happened 的类属性,则 99% 的情况下该值为 0,而只有 1% 的情况下为 1。传统的数据挖掘预测技术(决策树、朴素贝叶斯等)在这种情况下似乎不起作用。关于如何挖掘这个数据集有什么建议吗?谢谢。

最佳答案

这是任务的典型描述Anomaly detection task它定义了自己的一组算法:

In data mining, anomaly detection (or outlier detection) is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset.

以及关于可能的方法的声明:

Three broad categories of anomaly detection techniques exist. Unsupervised anomaly detection techniques detect anomalies in an unlabeled test data set under the assumption that the majority of the instances in the data set are normal by looking for instances that seem to fit least to the remainder of the data set. Supervised anomaly detection techniques require a data set that has been labeled as "normal" and "abnormal" and involves training a classifier (the key difference to many other statistical classification problems is the inherent unbalanced nature of outlier detection). Semi-supervised anomaly detection techniques construct a model representing normal behavior from a given normal training data set, and then testing the likelihood of a test instance to be generated by the learned model.

您会选择什么是个人喜好问题。

这些方法将有助于“学习”找出异常事件;那么“预测”它们的模型将定义您感兴趣的因素。

关于machine-learning - 从高度倾斜的数据集进行预测,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24830485/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com