machine-learning - 解释随机森林模型结果-6ren

machine-learning - 解释随机森林模型结果

转载作者：行者123 更新时间：2023-11-30 09:03:18

25

4

我非常感谢您对我的 RF 模型的解释以及如何总体评估结果的反馈。

57658 samples
   27 predictor
    2 classes: 'stayed', 'left' 

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 11531, 11531, 11532, 11532, 11532 
Resampling results across tuning parameters:

  mtry  splitrule   ROC        Sens       Spec        
   2    gini        0.6273579  0.9999011  0.0006250729
   2    extratrees  0.6246980  0.9999197  0.0005667791
  14    gini        0.5968382  0.9324610  0.1116113149
  14    extratrees  0.6192781  0.9740323  0.0523004026
  27    gini        0.5584677  0.7546156  0.2977507092
  27    extratrees  0.5589923  0.7635036  0.2905489827

Tuning parameter 'min.node.size' was held constant at a value of 1
ROC was used to select the optimal model using the largest value.
The final values used for the model were mtry = 2, splitrule = gini and min.node.size = 1.

对 Y 变量的函数形式以及分割数据的方式进行多次调整后，我得到了以下结果:我的 ROC 略有改善，但有趣的是，与我的初始模型相比，我的 Sens 和 Spec 发生了巨大变化。

35000 samples
   27 predictor
    2 classes: 'stayed', 'left' 

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 7000, 7000, 7000, 7000, 7000 
Resampling results across tuning parameters:

  mtry  splitrule   ROC        Sens          Spec     
   2    gini        0.6351733  0.0004618204  0.9998685
   2    extratrees  0.6287926  0.0000000000  0.9999899
  14    gini        0.6032979  0.1346653886  0.9170874
  14    extratrees  0.6235212  0.0753069696  0.9631711
  27    gini        0.5725621  0.3016414054  0.7575899
  27    extratrees  0.5716616  0.2998190728  0.7636219

Tuning parameter 'min.node.size' was held constant at a value of 1
ROC was used to select the optimal model using the largest value.
The final values used for the model were mtry = 2, splitrule = gini and min.node.size = 1.

这一次，我随机而不是按时间分割数据，并使用以下代码尝试了多个 mtry 值:

```{r Cross Validation Part 1}
set.seed(1992) # setting a seed for replication purposes 

folds <- createFolds(train_data$left_welfare, k = 5) # Partition the data into 5 equal folds

tune_mtry <- expand.grid(mtry = c(2,10,15,20), splitrule = c("variance", "extratrees"), min.node.size = c(1,5,10))

sapply(folds,length)

得到以下结果:

Random Forest 

84172 samples
   14 predictor
    2 classes: 'stayed', 'left' 

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 16834, 16834, 16834, 16835, 16835 
Resampling results across tuning parameters:

  mtry  splitrule   ROC        Sens       Spec     
   2    variance    0.5000000        NaN        NaN
   2    extratrees  0.7038724  0.3714761  0.8844723
   5    variance    0.5000000        NaN        NaN
   5    extratrees  0.7042525  0.3870192  0.8727755
   8    variance    0.5000000        NaN        NaN
   8    extratrees  0.7014818  0.4075797  0.8545012
  10    variance    0.5000000        NaN        NaN
  10    extratrees  0.6956536  0.4336180  0.8310368
  12    variance    0.5000000        NaN        NaN
  12    extratrees  0.6771292  0.4701687  0.7777730
  15    variance    0.5000000        NaN        NaN
  15    extratrees  0.5000000        NaN        NaN

Tuning parameter 'min.node.size' was held constant at a value of 1
ROC was used to select the optimal model using the largest value.
The final values used for the model were mtry = 5, splitrule = extratrees and min.node.size = 1.

最佳答案

看起来你的随机森林对第二类“左”几乎没有预测能力。最好的分数都具有极高的敏感性和低特异性，这基本上意味着分类器只是将所有内容分类为“停留”类，我认为这是大多数类别。不幸的是，这非常糟糕，因为它与天真的分类器所说的一切都来自第一类并没有相差太远。
另外，我不太明白您是否只尝试了 mtry 2,14 和 27 的值，但在这种情况下，我强烈建议尝试整个 3-25 范围(最佳值很可能位于中间的某个位置)。

除此之外，由于性能看起来相当糟糕(根据 ROC 判断)，我建议您在特征工程上进行更多工作以提取更多信息。否则，如果您对所拥有的内容感到满意，或者您认为无法提取更多内容，只需调整分类的概率阈值，以便您具有反射(reflect)您对类别的要求的敏感性和特异性(您可能更关心错误分类“留下”而不是“离开”，反之亦然，我不知道你的问题)。

希望对您有帮助!

关于machine-learning - 解释随机森林模型结果，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/59201857/

25

4

0

文章推荐： machine-learning - 使用 onnx 模型在 Arm NN android 上运行推理

文章推荐： java - 为什么 paint 方法运行两次？

文章推荐： python - 在 Pytorch 中实现 LeNet

MarkLogic 森林无效的跨设备链接
我们正在运行 MarkLogic 9.0-11 版本 3 节点集群，并且 MarkLogic 安装在“/var/opt/MarkLogic/”目录中，我们创建了“/var/opt/MarkLogic/
javascript - 我如何弄平一片(森林)树木？
我有一片任意高度的森林，大致像这样: let data = [ { "id": 2, "name": "AAA", "parent_id": null, "short_name": "A" },
machine-learning - 何时使用回归树/森林？
已关闭。此问题不符合Stack Overflow guidelines 。目前不接受答案。这个问题似乎与 help center 中定义的范围内的编程无关。 . 已关闭 7 年前。 Improve
python - 将深度很大的嵌套字典(森林)写入文本文件
我有一个巨大的深度字典，代表森林(许多非二叉树)，我想处理森林并创建一个包含森林所有可能关系的文本文件，例如给定字典: {'a': {'b': {'c': {}, 'd': {}}, 'g': {}}
android - 获取android上某个位置的区域类型(森林/街道/水域)
在我的 Android 应用程序中，我包含了谷歌地图。现在我想获取有关您周围地区的信息。例如，你是在公园/森林/海滩……所以我基本上想要一个用“水”回答输入坐标 53°33'40.9"N 10°00'
sql-server-2008 - 多个层次结构(森林？)中的成员到一个表中
如果我有下表: Member_Key Member_Name col1 Mem1 col2 Mem2 col3 Mem3 col4
python - 将深度很大的嵌套字典(森林)写入 BFS 样式的文本文件
继续我的老问题: Writing nested dictionary (forest) of a huge depth to a text file 现在我想把森林遍历写成BFS风格:我有一个巨大的深
ssl - 如何使用单个 SSL 证书保护多域(Active Directory 森林)环境中的所有 Web 服务器？
我有一个多域环境(事件目录林)，例如subdomain1.mydomain.com, subdomain2.mydomain.com 其中 mydomain.com 是根 AD 域 (GC) 和 su
c# - 如何恢复具有地形类型(水、森林、平原..)Google/Bing map 的 2D map ？
我想知道是否有可能在 Google map 或 Bing Mag 2D/3D map 上恢复地形类型(山脉、森林、水域、平原等...) 。为了根据玩家在现实世界中的位置生成 map !我认为可用 AP

首页

博学

6Ren·AI

商城

machine-learning - 解释随机森林模型结果