r - cforest varimp 似乎不适用于分类预测变量-6ren

r - cforest varimp 似乎不适用于分类预测变量

转载作者：行者123 更新时间：2023-12-01 14:46:04

我正在尝试使用 Party 包运行随机森林模型。我想使用 varimp 函数来确定条件变量的重要性，但它似乎不接受分类变量。这是一个link到我的数据，下面是我正在使用的代码。

> #set up dataframe
> bll = read.csv("bll_Nov2013.csv", header=TRUE)
> SB_Pres <- bll$Sandbar_Presence #binary presence/absnece
> Slope <-bll$Slope
> Dist2Shr <-bll$Dist2Shr
> Bathy <-bll$Bathy2
> Chla <-bll$GSM_Chl_Daily_MF
> SST <-bll$SST_PF_daily
> Region <- bll$Region
> MoonPhase <-bll$MoonPhase
> DaylightHours <- bll$DaylightHours
> bll_SB <- na.omit(data.frame(SB_Pres, Slope, Dist2Shr, Bathy, Chla, SST, DaylightHours, MoonPhase, Region))

> #run cforest model
> SBcf<- cforest(formula = factor(SB_Pres) ~ SST + Chla + Dist2Shr+ DaylightHours + Bathy + Slope + MoonPhase + factor(Region), data = bll_SB,  control = cforest_unbiased())
> SBcf

     Random Forest using Conditional Inference Trees

Number of trees:  500 

Response:  factor(SB_Pres) 
Inputs:  SST, Chla, Dist2Shr, DaylightHours, Bathy, Slope, MoonPhase, factor(Region) 
Number of observations:  534 

> #Varimp works if conditional = FALSE
> varimp(SBcf, conditional = FALSE)
           SST           Chla       Dist2Shr  DaylightHours          Bathy          Slope 
   0.024744898    0.084244898    0.015632653    0.009571429    0.006448980    0.003357143 
     MoonPhase factor(Region) 
   0.002724490    0.095000000 


> #Varimp does NOT work if conditional = TRU
> varimp(SBcf, conditional = TRUE)
Error in model.frame.default(formula = ~SST + Chla + Dist2Shr + DaylightHours +  : 
  variable lengths differ (found for 'factor(Region)')

如果我删除 factor(Region) 变量，则可以计算条件变量重要性。

这是带有分类预测变量的派对包 varimp 函数的已知行为吗？根据我的阅读，它应该能够处理分类预测变量 ( Conditional variable importance for random forests - Strobl et al ) - 它没有明确说明 varimp(obj, conditional = TRUE) 可以与分类预测变量一起使用。

任何见解将不胜感激!

谢谢，

丽莎

编辑:说明如果您在公式之外使用 as.factor 定义变量，则 as.factor 实际上不会生效 - 无论区域是否指定为因子，结果都是相同的。将这些结果与上面运行的其他 varimp (conditional = false) 进行比较，其中输出将变量显示为“factor(Region)”，而在下面它在两次运行中仅显示为“Region”。

> library("party")
> packageDescription("party")$Version
[1] "1.0-10"
> bll = read.csv("bll_SB.csv", header=TRUE)
> bll_SB <- na.omit(data.frame(bll))

> # region is specified as a factor
> bll_SB$SB_Pres <- factor(bll_SB$SB_Pres)
> bll_SB$Region <- factor(bll_SB$Region)
> set.seed(1)
> SBcf <- cforest(SB_Pres ~ ., data=bll_SB,  control=cforest_unbiased())
> SBcf


     Random Forest using Conditional Inference Trees

Number of trees:  500 

Response:  SB_Pres 
Inputs:  Slope, Dist2Shr, Bathy, Chla, SST, DaylightHours, MoonPhase, Region 
Number of observations:  534 

> system.time(res1 <- varimp(SBcf, conditional = FALSE))
   user  system elapsed 
  4.466   0.013   4.480 
> res1
        Slope      Dist2Shr         Bathy          Chla           SST DaylightHours 
  0.003632653   0.015908163   0.008285714   0.085367347   0.028846939   0.009520408 
    MoonPhase        Region 
  0.002969388   0.093061224 


> # Run again, region is not specified as a factor
> bll_SB$Region <- bll_SB$Region
> set.seed(1)
> SBcf <- cforest(SB_Pres ~ ., data=bll_SB,  control=cforest_unbiased())
> system.time(res2 <- varimp(SBcf, conditional = FALSE))
   user  system elapsed 
  4.562   0.015   4.578 
> res2
        Slope      Dist2Shr         Bathy          Chla           SST DaylightHours 
  0.003632653   0.015908163   0.008285714   0.085367347   0.028846939   0.009520408 
    MoonPhase        Region 
  0.002969388   0.093061224

最佳答案

我无法在您的示例中观察到问题。我能够使用以下代码计算您的数据集的条件变量重要性:

R> library("party")
R> packageDescription("party")$Version
[1] "1.0-10"

R> bll = read.csv("bll_SB.csv", header=TRUE)
R>
R> bll_SB <- na.omit(data.frame(bll))
R> bll_SB$SB_Pres <- factor(bll_SB$SB_Pres)
R> bll_SB$Region <- factor(bll_SB$Region)
R>
R> set.seed(1)
R> SBcf <- cforest(SB_Pres ~ ., data=bll_SB,  control=cforest_unbiased())
R> SBcf  
#
#          Random Forest using Conditional Inference Trees
#
# Number of trees:  500
#
# Response:  SB_Pres
# Inputs:  Slope, Dist2Shr, Bathy, Chla, SST, DaylightHours, MoonPhase, Region
# Number of observations:  534

R> system.time(res1 <- varimp(SBcf, conditional = FALSE))
#   user  system elapsed
#  5.971   0.012   5.994
R> system.time(res2 <- varimp(SBcf, conditional = TRUE))
#   user  system elapsed
# 2704.1    58.2  2768.0
R> res1 
#         Slope      Dist2Shr         Bathy          Chla           SST
#      0.003633      0.015908      0.008286      0.085367      0.028847
# DaylightHours     MoonPhase        Region
#      0.009520      0.002969      0.093061
R> res2 
#         Slope      Dist2Shr         Bathy          Chla           SST
#    -6.122e-05     2.449e-03    -4.082e-05     1.004e-02     3.367e-03
# DaylightHours     MoonPhase        Region
#     5.714e-04     6.735e-04     1.067e-02

关于r - cforest varimp 似乎不适用于分类预测变量，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/20343974/

文章推荐： spring - 使用 Liquibase 和 Spring Boot

文章推荐： java - JSON 数据的 Ajax 响应

文章推荐： java - 在运行时更新或更改 View 中的图像

文章推荐： Heroku 更新说我没有工具带

r - cforest 打印空树
我正在尝试使用 cforest 函数(R，派对包)。这就是我为构建森林所做的工作: library("party") set.seed(42) readingSkills.cf 5 5
r - 并行训练 cforest
我有一个非常大的数据框，包含 790,000 行和 140 个预测变量。其中一些彼此密切相关，并且在不同的范围内。与 randomForest包，我可以使用 foreach 仅使用一小部分数据样本在每
使用插入符号包运行 cforest with controls = cforest_unbiased()
我想使用 caret 包运行一个无偏见的 cforest。这可能吗？ tc ) : no method for coercing this S4 class to a vector 这是因为无法将 c
r - cforest party 不平衡类
我想用 party 库中的 cforest 函数来衡量特征的重要性。我的输出变量在 0 类中有 2000 个样本，在 1 类中有 100 个样本。我认为避免类不平衡造成偏差的一个好方法是使用子样本
r - cforest varimp 似乎不适用于分类预测变量
我正在尝试使用 Party 包运行随机森林模型。我想使用 varimp 函数来确定条件变量的重要性，但它似乎不接受分类变量。这是一个link到我的数据，下面是我正在使用的代码。 > #set up d
r - randomForest、randomForestSRC 或 cforest 中单棵树的重要性可变吗？
我正在尝试在 R 中找到一种方法来计算随机森林或条件随机森林的单棵树的变量重要性。一个好的起点是 rpart:::importance 命令，它计算 rpart 树的变量重要性的度量: > libr
r - 将 .combine 与 cforest 一起使用时遇到问题
您好，我在 R 中并行化 cforest 时遇到问题。我一直在尝试使用 party 包中的 cforest 函数创建分类模型。我希望它在我的计算机上的多个内核中并行运行。我已经使用 randomFo
r - 如何在 R (party-package) 中绘制 cForest 的学习曲线？
我使用 cForest 构建了随机森林模型。现在，我想绘制一条简单的学习曲线，该曲线在 x 轴上显示树木数量，在 y 轴上显示错误分类错误(如下图所示)。然而，经过多次谷歌搜索后，我仍然无法弄清楚如

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

r - cforest varimp 似乎不适用于分类预测变量