gpt4 book ai didi

r - 如何在 mlr3 中重复 glmnet 的超参数调整(alpha 和/或 lambda)

转载 作者:行者123 更新时间:2023-12-04 03:37:01 25 4
gpt4 key购买 nike

我想重复 alpha 的超参数调整( lambda 和/或 glmnet )在 mlr3avoid variability在较小的数据集中
caret ,我可以用 "repeatedcv" 做到这一点
因为我真的很喜欢mlr3家庭包 我想用它们进行分析。但是,我不确定如何在 mlr3 中执行此步骤的正确方法。
示例数据

#library
library(caret)
library(mlr3verse)
library(mlbench)

# get example data
data(PimaIndiansDiabetes, package="mlbench")
data <- PimaIndiansDiabetes

# get small training data
train.data <- data[1:60,]
创建于 2021-03-18 由 reprex package (v1.0.0)
caret使用 alpha 的方法(调整 lambda"cv" )和 "repeatedcv"

trControlCv <- trainControl("cv",
number = 5,
classProbs = TRUE,
savePredictions = TRUE,
summaryFunction = twoClassSummary)

# use "repeatedcv" to avoid variability in smaller data sets
trControlRCv <- trainControl("repeatedcv",
number = 5,
repeats= 20,
classProbs = TRUE,
savePredictions = TRUE,
summaryFunction = twoClassSummary)

# train and extract coefficients with "cv" and different set.seed
set.seed(2323)
model <- train(
diabetes ~., data = train.data, method = "glmnet",
trControl = trControlCv,
tuneLength = 10,
metric="ROC"
)

coef(model$finalModel, model$finalModel$lambdaOpt) -> coef1

set.seed(23)
model <- train(
diabetes ~., data = train.data, method = "glmnet",
trControl = trControlCv,
tuneLength = 10,
metric="ROC"
)

coef(model$finalModel, model$finalModel$lambdaOpt) -> coef2


# train and extract coefficients with "repeatedcv" and different set.seed
set.seed(13)

model <- train(
diabetes ~., data = train.data, method = "glmnet",
trControl = trControlRCv,
tuneLength = 10,
metric="ROC"
)

coef(model$finalModel, model$finalModel$lambdaOpt) -> coef3


set.seed(55)
model <- train(
diabetes ~., data = train.data, method = "glmnet",
trControl = trControlRCv,
tuneLength = 10,
metric="ROC"
)

coef(model$finalModel, model$finalModel$lambdaOpt) -> coef4

创建于 2021-03-18 由 reprex package (v1.0.0)
用交叉验证展示不同的系数,用重复的交叉验证展示相同的系数
# with "cv" I get different coefficients
identical(coef1, coef2)
#> [1] FALSE

# with "repeatedcv" I get the same coefficients
identical(coef3,coef4)
#> [1] TRUE

创建于 2021-03-18 由 reprex package (v1.0.0)
第一 mlr3使用 cv.glmnet 的方法(进行内部调整 lambda )
# create elastic net regression
glmnet_lrn = lrn("classif.cv_glmnet", predict_type = "prob")

# define train task
train.task <- TaskClassif$new("train.data", train.data, target = "diabetes")

# create learner
learner = as_learner(glmnet_lrn)

# train the learner with different set.seed
set.seed(2323)
learner$train(train.task)
coef(learner$model, s = "lambda.min") -> coef1

set.seed(23)
learner$train(train.task)
coef(learner$model, s = "lambda.min") -> coef2
创建于 2021-03-18 由 reprex package (v1.0.0)
通过交叉验证展示不同的系数
# compare coefficients
coef1
#> 9 x 1 sparse Matrix of class "dgCMatrix"
#> 1
#> (Intercept) -3.323460895
#> age 0.005065928
#> glucose 0.019727881
#> insulin .
#> mass .
#> pedigree .
#> pregnant 0.001290570
#> pressure .
#> triceps 0.020529162
coef2
#> 9 x 1 sparse Matrix of class "dgCMatrix"
#> 1
#> (Intercept) -3.146190752
#> age 0.003840963
#> glucose 0.019015433
#> insulin .
#> mass .
#> pedigree .
#> pregnant .
#> pressure .
#> triceps 0.018841557
创建于 2021-03-18 由 reprex package (v1.0.0)
更新1:我取得的进展
根据下面的评论和 this comment我可以用 rsmpAutoTuneranswer建议不要调 cv.glmnet但是 glmnet (当时在 ml3 中不可用)
第二 mlr3使用 glmnet 的方法(重复 alphalambda 的调音)
# define train task
train.task <- TaskClassif$new("train.data", train.data, target = "diabetes")

# create elastic net regression
glmnet_lrn = lrn("classif.glmnet", predict_type = "prob")

# turn to learner
learner = as_learner(glmnet_lrn)

# make search space
search_space = ps(
alpha = p_dbl(lower = 0, upper = 1),
s = p_dbl(lower = 1, upper = 1)
)

# set terminator
terminator = trm("evals", n_evals = 20)

#set tuner
tuner = tnr("grid_search", resolution = 3)

# tune the learner
at = AutoTuner$new(
learner = learner,
rsmp("repeated_cv"),
measure = msr("classif.ce"),
search_space = search_space,
terminator = terminator,
tuner=tuner)

at
#> <AutoTuner:classif.glmnet.tuned>
#> * Model: -
#> * Parameters: list()
#> * Packages: glmnet
#> * Predict Type: prob
#> * Feature types: logical, integer, numeric
#> * Properties: multiclass, twoclass, weights
开放问题
我如何证明我的第二种方法是有效的,并且我使用不同的种子获得相同或相似的系数? IE。我如何提取 AutoTuner 的最终模型的系数
set.seed(23)
at$train(train.task) -> tune1

set.seed(2323)
at$train(train.task) -> tune2
创建于 2021-03-18 由 reprex package (v1.0.0)

最佳答案

glmnet 的重复超参数调整(alpha 和 lambda)可以使用 来完成第二 mlr3方法 如上所述。
可以使用 stats::coef 提取系数以及 AutoTuner 中的存储值

coef(tune1$model$learner$model, alpha=tune1$tuning_result$alpha,s=tune1$tuning_result$s)
# 9 x 1 sparse Matrix of class "dgCMatrix"
# 1
# (Intercept) -1.6359082102
# age 0.0075541841
# glucose 0.0044351365
# insulin 0.0005821515
# mass 0.0077104934
# pedigree 0.0911233031
# pregnant 0.0164721202
# pressure 0.0007055435
# triceps 0.0056942014
coef(tune2$model$learner$model, alpha=tune2$tuning_result$alpha,s=tune2$tuning_result$s)
# 9 x 1 sparse Matrix of class "dgCMatrix"
# 1
# (Intercept) -1.6359082102
# age 0.0075541841
# glucose 0.0044351365
# insulin 0.0005821515
# mass 0.0077104934
# pedigree 0.0911233031
# pregnant 0.0164721202
# pressure 0.0007055435
# triceps 0.0056942014

关于r - 如何在 mlr3 中重复 glmnet 的超参数调整(alpha 和/或 lambda),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66696405/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com