gpt4 book ai didi

r caretEnsemble 警告 : indexes not defined in trControl

转载 作者:行者123 更新时间:2023-12-01 22:43:32 24 4
gpt4 key购买 nike

我有一些 r/caret 代码,可以将多个交叉验证的模型与某些数据相匹配,但我收到一条警告消息,提示我无法找到任何相关信息。这是我应该关心的事情吗?

library(datasets)
library(caret)
library(caretEnsemble)

# load data
data("iris")

# establish cross-validation structure
set.seed(32)
trainControl <- trainControl(method="repeatedcv", number=5, repeats=3, savePredictions=TRUE, search="random")

# fit several (cross-validated) models
algorithmList <- c('lda', # Linear Discriminant Analysis
'rpart' , # Classification and Regression Trees
'svmRadial') # SVM with RBF Kernel

models <- caretList(Species~., data=iris, trControl=trainControl, methodList=algorithmList)

日志输出:

Warning messages:
1: In trControlCheck(x = trControl, y = target) :
x$savePredictions == TRUE is depreciated. Setting to 'final' instead.
2: In trControlCheck(x = trControl, y = target) :
indexes not defined in trControl. Attempting to set them ourselves, so each model in the ensemble will have the same resampling indexes.

...我认为我的 trainControl 对象定义了一个交叉验证结构(3x 5 倍交叉验证),将为 cv 分割生成一组索引。所以我很困惑为什么我会收到这条消息。

最佳答案

trainControl默认情况下不会为您生成索引,它充当将所有参数传递给您正在训练的每个模型的一种方式。

当我们搜索有关该错误的 github issues 时,我们可以找到 this particular issue .

You need to make sure that every model is fit with the EXACT same resampling folds. caretEnsemble builds the ensemble by merging together the test sets for each cross-validation fold, and you will get incorrect results if each fold has different observations in it.

Before you fit your models, you need to construct a trainControl object, and manually set the indexes in that object.

E.g. myControl <- trainControl(index=createFolds(y, 10)).

We are working on an interface to caretEnsemble that handles constructing the resampling strategy for you and then fitting multiple models using those resamples, but it is not yet finished.

To reiterate, that check is there for a reason. You need to set the index argument in trainControl, and pass the EXACT SAME indexes to each model you wish to ensemble.

所以这意味着当您指定 number = 5 时和repeats = 3这些模型实际上并没有获得属于每个折叠的样本的预定索引,而是独立生成自己的索引。

因此,为了确保模型在哪些样本属于哪些折叠方面彼此一致,您必须指定 index = createFolds(iris$Species, 5)在你的trainControl对象

# new trainControl object with index specified
trainControl <- trainControl(method = "repeatedcv",
number = 5,
index = createFolds(iris$Species, 5),
repeats = 3,
savePredictions = "all",
search = "random")

关于r caretEnsemble 警告 : indexes not defined in trControl,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45155872/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com