gpt4 book ai didi

r - "best tune"和 "Resampling results across tuning parameters"插入符 R 包不一致

转载 作者:行者123 更新时间:2023-11-30 08:34:10 27 4
gpt4 key购买 nike

我正在尝试使用带有调整网格的 Caret 创建模型

svmGrid <- Expand.grid(C = c(0.0001,0.001,0.01,0.1,1,10,20,30,40,50,100))

然后再次使用该网格的子集:

svmGrid <- Expand.grid(C = c(0.0001,0.001,0.01,0.1,1,10,20,30,40,50))

问题是我得到了不同的“最佳调谐”和“跨调谐参数的重采样结果”,尽管为第一个调谐网格选择的 C 参数值也出现在第二个调谐网格中。

当对采样参数使用不同的选项以及在trainControl()中使用不同的summaryFunction选项时,我也会遇到这些差异

不用说,由于每次都会选择不同的最佳模型,因此会影响测试集上的预测结果。

有人知道为什么会发生这种情况吗?

可重复的数据集:

library(caret)
library(doMC)
registerDoMC(cores = 8)

set.seed(2969)
imbal_train <- twoClassSim(100, intercept = -20, linearVars = 20)
imbal_test <- twoClassSim(100, intercept = -20, linearVars = 20)
table(imbal_train$Class)

使用第一个调谐网格运行

svmGrid <-  expand.grid(C = c(0.0001,0.001,0.01,0.1,1,10,20,30,40,50,100))

up_fitControl = trainControl(method = "cv", number = 10 , savePredictions = TRUE, allowParallel = TRUE, sampling = "up", seeds = NA)


set.seed(5627)
up_inside <- train(Class ~ ., data = imbal_train,
method = "svmLinear",
trControl = up_fitControl,
tuneGrid = svmGrid,
scale = FALSE)

up_inside

首次运行输出:

> up_inside
Support Vector Machines with Linear Kernel

100 samples
25 predictors
2 classes: 'Class1', 'Class2'

No pre-processing
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 90, 91, 90, 90, 89, 90, ...
Addtional sampling using up-sampling

Resampling results across tuning parameters:

C Accuracy Kappa Accuracy SD Kappa SD
1e-04 0.7734343 0.252201364 0.1227632 0.3198165
1e-03 0.8225253 0.396439198 0.1245455 0.3626456
1e-02 0.7595960 0.116150973 0.1431780 0.3046825
1e-01 0.7686869 0.051430454 0.1167093 0.2712062
1e+00 0.7695960 -0.004261294 0.1162279 0.2190151
1e+01 0.7093939 0.111852756 0.2030250 0.3810059
2e+01 0.7195960 0.040458804 0.1932690 0.2580560
3e+01 0.7195960 0.040458804 0.1932690 0.2580560
4e+01 0.7195960 0.040458804 0.1932690 0.2580560
5e+01 0.7195960 0.040458804 0.1932690 0.2580560
1e+02 0.7195960 0.040458804 0.1932690 0.2580560

Accuracy was used to select the optimal model using the largest value.
The final value used for the model was C = 0.001.

使用第二个调谐网格运行

svmGrid <-  expand.grid(C = c(0.0001,0.001,0.01,0.1,1,10,20,30,40,50))

up_fitControl = trainControl(method = "cv", number = 10 , savePredictions = TRUE, allowParallel = TRUE, sampling = "up", seeds = NA)


set.seed(5627)
up_inside <- train(Class ~ ., data = imbal_train,
method = "svmLinear",
trControl = up_fitControl,
tuneGrid = svmGrid,
scale = FALSE)

up_inside

第二次运行输出:

> up_inside
Support Vector Machines with Linear Kernel

100 samples
25 predictors
2 classes: 'Class1', 'Class2'

No pre-processing
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 90, 91, 90, 90, 89, 90, ...
Addtional sampling using up-sampling

Resampling results across tuning parameters:

C Accuracy Kappa Accuracy SD Kappa SD
1e-04 0.8125253 0.392165694 0.13043060 0.3694786
1e-03 0.8114141 0.375569633 0.12291273 0.3549978
1e-02 0.7995960 0.205413345 0.06734882 0.2662161
1e-01 0.7495960 0.017139266 0.09742161 0.2270128
1e+00 0.7695960 -0.004261294 0.11622791 0.2190151
1e+01 0.7093939 0.111852756 0.20302503 0.3810059
2e+01 0.7195960 0.040458804 0.19326904 0.2580560
3e+01 0.7195960 0.040458804 0.19326904 0.2580560
4e+01 0.7195960 0.040458804 0.19326904 0.2580560
5e+01 0.7195960 0.040458804 0.19326904 0.2580560

Accuracy was used to select the optimal model using the largest value.
The final value used for the model was C = 1e-04.

最佳答案

如果您没有在 caret 中提供种子,它会为您选择它们​​。由于网格的长度不同,种子的折叠程度也会略有不同。

下面,我粘贴了示例,其中我刚刚重命名了第二个模型,以便更容易获得比较的输出:

> up_inside$control$seeds[[1]]
[1] 825016 802597 128276 935565 324036 188187 284067 58853 923008 995461 60759
> up_inside2$control$seeds[[1]]
[1] 825016 802597 128276 935565 324036 188187 284067 58853 923008 995461
> up_inside$control$seeds[[2]]
[1] 966837 256990 592077 291736 615683 390075 967327 349693 73789 155739 916233
# See how the first seed here is the same as the last seed of the first model
> up_inside2$control$seeds[[2]]
[1] 60759 966837 256990 592077 291736 615683 390075 967327 349693 73789

如果您现在继续设置自己的种子,您将得到相同的输出:

# Seeds for your first train
myseeds <- list(c(1:10,1000), c(11:20,2000), c(21:30, 3000),c(31:40, 4000),c(41:50, 5000),
c(51:60, 6000),c(61:70, 7000),c(71:80, 8000),c(81:90, 9000),c(91:100, 10000), c(343))
# Seeds for your second train
myseeds2 <- list(c(1:10), c(11:20), c(21:30),c(31:40),c(41:50),c(51:60),
c(61:70),c(71:80),c(81:90),c(91:100), c(343))

> up_inside
Support Vector Machines with Linear Kernel

100 samples
25 predictor
2 classes: 'Class1', 'Class2'

No pre-processing
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 90, 91, 90, 90, 89, 90, ...
Addtional sampling using up-sampling

Resampling results across tuning parameters:

C Accuracy Kappa
1e-04 0.7714141 0.239823027
1e-03 0.7914141 0.332834590
1e-02 0.7695960 0.207000745
1e-01 0.7786869 0.103957926
1e+00 0.7795960 0.006849817
1e+01 0.7093939 0.111852756
2e+01 0.7195960 0.040458804
3e+01 0.7195960 0.040458804
4e+01 0.7195960 0.040458804
5e+01 0.7195960 0.040458804
1e+02 0.7195960 0.040458804

Accuracy was used to select the optimal model using the largest value.
The final value used for the model was C = 0.001.
> up_inside2
Support Vector Machines with Linear Kernel

100 samples
25 predictor
2 classes: 'Class1', 'Class2'

No pre-processing
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 90, 91, 90, 90, 89, 90, ...
Addtional sampling using up-sampling

Resampling results across tuning parameters:

C Accuracy Kappa
1e-04 0.7714141 0.239823027
1e-03 0.7914141 0.332834590
1e-02 0.7695960 0.207000745
1e-01 0.7786869 0.103957926
1e+00 0.7795960 0.006849817
1e+01 0.7093939 0.111852756
2e+01 0.7195960 0.040458804
3e+01 0.7195960 0.040458804
4e+01 0.7195960 0.040458804
5e+01 0.7195960 0.040458804

Accuracy was used to select the optimal model using the largest value.
The final value used for the model was C = 0.001.

关于r - "best tune"和 "Resampling results across tuning parameters"插入符 R 包不一致,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38203806/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com