gpt4 book ai didi

machine-learning - 解释随机森林模型结果

转载 作者:行者123 更新时间:2023-11-30 09:03:18 25 4
gpt4 key购买 nike

我非常感谢您对我的 RF 模型的解释以及如何总体评估结果的反馈。

57658 samples
27 predictor
2 classes: 'stayed', 'left'

No pre-processing
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 11531, 11531, 11532, 11532, 11532
Resampling results across tuning parameters:

mtry splitrule ROC Sens Spec
2 gini 0.6273579 0.9999011 0.0006250729
2 extratrees 0.6246980 0.9999197 0.0005667791
14 gini 0.5968382 0.9324610 0.1116113149
14 extratrees 0.6192781 0.9740323 0.0523004026
27 gini 0.5584677 0.7546156 0.2977507092
27 extratrees 0.5589923 0.7635036 0.2905489827

Tuning parameter 'min.node.size' was held constant at a value of 1
ROC was used to select the optimal model using the largest value.
The final values used for the model were mtry = 2, splitrule = gini and min.node.size = 1.

对 Y 变量的函数形式以及分割数据的方式进行多次调整后,我得到了以下结果:我的 ROC 略有改善,但有趣的是,与我的初始模型相比,我的 Sens 和 Spec 发生了巨大变化。

35000 samples
27 predictor
2 classes: 'stayed', 'left'

No pre-processing
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 7000, 7000, 7000, 7000, 7000
Resampling results across tuning parameters:

mtry splitrule ROC Sens Spec
2 gini 0.6351733 0.0004618204 0.9998685
2 extratrees 0.6287926 0.0000000000 0.9999899
14 gini 0.6032979 0.1346653886 0.9170874
14 extratrees 0.6235212 0.0753069696 0.9631711
27 gini 0.5725621 0.3016414054 0.7575899
27 extratrees 0.5716616 0.2998190728 0.7636219

Tuning parameter 'min.node.size' was held constant at a value of 1
ROC was used to select the optimal model using the largest value.
The final values used for the model were mtry = 2, splitrule = gini and min.node.size = 1.

这一次,我随机而不是按时间分割数据,并使用以下代码尝试了多个 mtry 值:

```{r Cross Validation Part 1}
set.seed(1992) # setting a seed for replication purposes

folds <- createFolds(train_data$left_welfare, k = 5) # Partition the data into 5 equal folds

tune_mtry <- expand.grid(mtry = c(2,10,15,20), splitrule = c("variance", "extratrees"), min.node.size = c(1,5,10))

sapply(folds,length)

得到以下结果:

Random Forest 

84172 samples
14 predictor
2 classes: 'stayed', 'left'

No pre-processing
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 16834, 16834, 16834, 16835, 16835
Resampling results across tuning parameters:

mtry splitrule ROC Sens Spec
2 variance 0.5000000 NaN NaN
2 extratrees 0.7038724 0.3714761 0.8844723
5 variance 0.5000000 NaN NaN
5 extratrees 0.7042525 0.3870192 0.8727755
8 variance 0.5000000 NaN NaN
8 extratrees 0.7014818 0.4075797 0.8545012
10 variance 0.5000000 NaN NaN
10 extratrees 0.6956536 0.4336180 0.8310368
12 variance 0.5000000 NaN NaN
12 extratrees 0.6771292 0.4701687 0.7777730
15 variance 0.5000000 NaN NaN
15 extratrees 0.5000000 NaN NaN

Tuning parameter 'min.node.size' was held constant at a value of 1
ROC was used to select the optimal model using the largest value.
The final values used for the model were mtry = 5, splitrule = extratrees and min.node.size = 1.

最佳答案

看起来你的随机森林对第二类“左”几乎没有预测能力。最好的分数都具有极高的敏感性和低特异性,这基本上意味着分类器只是将所有内容分类为“停留”类,我认为这是大多数类别。不幸的是,这非常糟糕,因为它与天真的分类器所说的一切都来自第一类并没有相差太远。
另外,我不太明白您是否只尝试了 mtry 2,14 和 27 的值,但在这种情况下,我强烈建议尝试整个 3-25 范围(最佳值很可能位于中间的某个位置)。

除此之外,由于性能看起来相当糟糕(根据 ROC 判断),我建议您在特征工程上进行更多工作以提取更多信息。否则,如果您对所拥有的内容感到满意,或者您认为无法提取更多内容,只需调整分类的概率阈值,以便您具有反射(reflect)您对类别的要求的敏感性和特异性(您可能更关心错误分类“留下”而不是“离开”,反之亦然,我不知道你的问题)。

希望对您有帮助!

关于machine-learning - 解释随机森林模型结果,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59201857/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com