gpt4 book ai didi

r - 使用 R 在 randomForest 上执行交叉验证

转载 作者:行者123 更新时间:2023-12-04 16:49:41 24 4
gpt4 key购买 nike

我正在使用 RrandomForest 包来训练分类模型。为了将它与其他分类器进行比较,我需要一种方法来显示 Weka 中相当冗长的交叉验证方法提供的所有信息。因此,R 脚本应该从 Weka 输出类似 [a] 的内容。

  1. 有没有办法通过 RWeka 验证 R 模型以生成这些度量?
  2. 如果不是,如何完全在 R 中对随机森林进行交叉验证?
  3. 是否可以在这里使用 randomForest 包中的 rfcv?我无法让它工作。

我知道随机森林中使用的包外错误 (OOB) 是某种交叉验证。但我需要完整的信息才能进行合适的比较。

到目前为止,我使用 R 尝试的是 [b]。但是,由于缺少值,代码还会在我的设置 [c] 上产生错误。

那么,你能帮我做交叉验证吗?


附录

[a] 维卡:

=== Stratified cross-validation ===
=== Summary ===

Correctly Classified Instances 3059 96.712 %
Incorrectly Classified Instances 104 3.288 %
Kappa statistic 0.8199
Mean absolute error 0.1017
Root mean squared error 0.1771
Relative absolute error 60.4205 %
Root relative squared error 61.103 %
Coverage of cases (0.95 level) 99.6206 %
Mean rel. region size (0.95 level) 78.043 %
Total Number of Instances 3163

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0,918 0,028 0,771 0,918 0,838 0,824 0,985 0,901 sick-euthyroid
0,972 0,082 0,991 0,972 0,982 0,824 0,985 0,998 negative
Weighted Avg. 0,967 0,077 0,971 0,967 0,968 0,824 0,985 0,989

=== Confusion Matrix ===

a b <-- classified as
269 24 | a = sick-euthyroid
80 2790 | b = negative

[b] 目前的代码:

library(randomForest) #randomForest() and rfImpute()
library(foreign) # read.arff()
library(caret) # train() and trainControl()

nTrees <- 2 # 200
myDataset <- 'D:\\your\\directory\\SE.arff' # http://hakank.org/weka/SE.arff

mydb = read.arff(myDataset)
mydb.imputed <- rfImpute(class ~ ., data=mydb, ntree = nTrees, importance = TRUE)
myres.rf <- randomForest(class ~ ., data=mydb.imputed, ntree = nTrees, importance = TRUE)
summary(myres.rf)

# specify type of resampling to 10-fold CV
fitControl <- trainControl(method = "rf",number = 10,repeats = 10)
set.seed(825)

# deal with NA | NULL values in categorical variables
#mydb.imputed[is.na(mydb.imputed)] <- 1
#mydb.imputed[is.null(mydb.imputed)] <- 1

rfFit <- train(class~ ., data=mydb.imputed,
method = "rf",
trControl = fitControl,
## This last option is actually one
## for rf() that passes through
ntree = nTrees, importance = TRUE, na.action = na.omit)
rfFit

错误是:

Error in names(resamples) <- gsub("^\\.", "", names(resamples)) : 
attempt to set an attribute on NULL

使用traceback()

5: nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, 
method = models, ppOpts = preProcess, ctrl = trControl, lev = classLevels,
...)
4: train.default(x, y, weights = w, ...)
3: train(x, y, weights = w, ...)
2: train.formula(class~ ., data = mydb.imputed, method = "rf",
trControl = fitControl, ntree = nTrees, importance = TRUE,
sampsize = rep(minorityClassNum, 2), na.action = na.omit)
1: train(class~ ., data = mydb.imputed, method = "rf", trControl = fitControl,
ntree = nTrees, importance = TRUE, sampsize = rep(minorityClassNum,
2), na.action = na.omit) at #39

[c] R 版本信息通过 sessionInfo()

R version 3.1.0 (2014-04-10)
Platform: i386-w64-mingw32/i386 (32-bit)

[...]

other attached packages:
[1] e1071_1.6-3 caret_6.0-30 ggplot2_1.0.0 foreign_0.8-61 randomForest_4.6-7 DMwR_0.4.1
[7] lattice_0.20-29 JGR_1.7-16 iplots_1.1-7 JavaGD_0.6-1 rJava_0.9-6

最佳答案

我不知道 weka,但我在 R 中进行了随机森林建模,并且我一直使用 R 中的预测函数来执行此操作。

尝试使用这个函数

predict(Model,data)

将输出与原始值绑定(bind),使用table命令得到混淆矩阵。

关于r - 使用 R 在 randomForest 上执行交叉验证,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24364430/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com