gpt4 book ai didi

r - 在 tidymodels recipes::recipe() 中创建一个多元矩阵

转载 作者:行者123 更新时间:2023-12-05 04:31:53 25 4
gpt4 key购买 nike

我正在尝试对一个模型进行 k 折交叉验证,该模型根据卫星图像预测树种断面积比例的联合分布。这需要使用 DiricihletReg::DirichReg() 函数,这反过来又需要使用 DirichletReg::DR_data() 函数将响应变量准备为矩阵.我最初尝试在 caret:: 包中完成此操作,但我发现 caret:: 不支持多变量响应。从那以后,我尝试在 tidymodels:: 软件包套件中实现它。根据有关如何在 parsnip::(我欣赏 Max Kuhn 的蔬菜幽默)包中注册新模型的文档,我创建了一个“DREG”模型和一个“DR”引擎。当我简单地在单个训练数据集上调用它时,我的注册模型就可以工作,但我的目标是进行 kfolds 交叉验证,实现 vfolds_cv(),一个 workflow()和“fit_resample()”函数。使用我目前拥有的代码,我会收到警告消息:

Warning message:
All models failed. See the `.notes` column.

这些注释指出 Error in get(resp_char, environment(oformula)): object 'cbind(PSME, TSHE, ALRU2)' not found 我相信这是由于使用 DR_data() 将响应变量预处理为 Dirichlet::DirichReg() 正确运行所需的格式。我认为我需要实现的解决方案包括在 recipe() 调用或 set_fit() 调用中进行预处理,当我注册此模型时防风草::。我在指定配方时尝试使用 step_mutate() 函数,但它对每一列执行一个函数,而不是将函数作为输入应用到列上。这导致 fit_resample() 输出的“注释”中出现以下错误:

Must subset columns with a valid subscript vector.
Subscript has the wrong type `quosures`.
It must be numeric or character.

有没有办法使用 DR_data() 函数和 step_*()< 将几列转换为 DirichletRegData 函数或在 set_fit()set_pred() 中使用 pre= 参数?

下面是我的可重现示例:

##Loading Necessary Packages##
library(tidymodels)
library(DirichletReg)

##Creating Fake Data##
set.seed(88)#For reproducibility

#Response variables#
PSME_BA<-rnorm(100,50, 15)
TSHE_BA<-rnorm(100,40,12)
ALRU2_BA<-rnorm(100,20,0.5)
Total_BA<-PSME_BA+TSHE_BA+ALRU2_BA

#Predictor variables#
B1<-runif(100, 0, 2000)
B2<-runif(100, 0, 1800)
B3<-runif(100, 0, 3000)

#Dataset for modeling#
DF<-data.frame(PSME=PSME_BA/Total_BA, TSHE=TSHE_BA/Total_BA, ALRU2=ALRU2_BA/Total_BA,
B1=B1, B2=B2, B3=B3)

##Modeling the data using Dirichlet regression with repeated k-folds cross validation##
#Registering the model to parsnip::#
set_new_model("DREG")
set_model_mode(model="DREG", mode="regression")
set_model_engine("DREG", mode="regression", eng="DR")
set_dependency("DREG", eng="DR", pkg="DirichletReg")

set_model_arg(
model = "DREG",
eng = "DR",
parsnip = "param",
original = "model",
func = list(pkg = "DirichletReg", fun = "DirichReg"),
has_submodel = FALSE
)

DREG <-
function(mode = "regression", param = NULL) {
# Check for correct mode
if (mode != "regression") {
rlang::abort("`mode` should be 'regression'")
}

# Capture the arguments in quosures
args <- list(sub_classes = rlang::enquo(param))

# Save some empty slots for future parts of the specification
new_model_spec(
"DREG",
args=args,
eng_args = NULL,
mode = mode,
method = NULL,
engine = NULL
)
}

set_fit(
model = "DREG",
eng = "DR",
mode = "regression",
value = list(
interface = "formula",
protect = NULL,
func = c(pkg = "DirichletReg", fun = "DirichReg"),
defaults = list()
)
)

set_encoding(
model = "DREG",
eng = "DR",
mode = "regression",
options = list(
predictor_indicators = "none",
compute_intercept = TRUE,
remove_intercept = TRUE,
allow_sparse_x = FALSE
)
)

set_pred(
model = "DREG",
eng = "DR",
mode = "regression",
type = "numeric",
value = list(
pre = NULL,
post = NULL,
func = c(fun = "predict.DirichletRegModel"),
args =
list(
object = expr(object$fit),
newdata = expr(new_data),
type = "response"
)
)
)

##Running the Model##
DF$Y<-DR_data(DF[,c(1:3)]) #Preparing the response variables

dreg_spec<-DREG(param="alternative") %>%
set_engine("DR")

dreg_mod<-dreg_spec %>%
fit(Y~B1+B2+B3, data = DF)#Model works when simply run on single dataset

##Attempting Crossvalidation##
#First attempt - simply call Y as the response variable in the recipe#
kfolds<-vfold_cv(DF, v=10, repeats = 2)
rcp<-recipe(Y~B1+B2+B3, data=DF)

dreg_fit<- workflow() %>%
add_model(dreg_spec) %>%
add_recipe(rcp)

dreg_rsmpl<-dreg_fit %>%
fit_resamples(kfolds)#Throws warning about all models failing

#second attempt - use step_mutate_at()#
rcp<-recipe(~B1+B2+B3, data=DF) %>%
step_mutate_at(fn=DR_data, var=vars(PSME, TSHE, ALRU2))

dreg_fit<- workflow() %>%
add_model(dreg_spec) %>%
add_recipe(rcp)

dreg_rsmpl<-dreg_fit %>%
fit_resamples(kfolds)#Throws warning about all models failing

最佳答案

这行得通,但我不确定这是否符合您的预期。

首先——获取 CV 和 DR_data() 的数据设置

我不知道有任何软件包构建了本质上是 CV 和 DirichletReg 的翻译。因此,该部分是手动完成的。您可能会惊讶地发现它并没有那么复杂。

使用您创建的数据和您为 tidymodels 创建的建模对象(那些以 set_ 为前缀的对象),我创建了您尝试使用的 CV 结构。

df1 <- data.frame(PSME = PSME_BA/Total_BA, TSHE = TSHE_BA/Total_BA, 
ALRU2=ALRU2_BA/Total_BA, B1, B2, B3)

set.seed(88)
kDf2 <- kDf1 <- vfold_cv(df1, v=10, repeats = 2)

对于 kDf2 中标识的 20 个子集数据帧中的每一个,我使用 DR_data 为模型设置数据。

# convert to DR_data (each folds and repeats)
df2 <- map(1:20,
.f = function(x){
in_ids = kDf1$splits[[x]]$in_id
dd <- kDf1$splits[[x]]$data[in_ids, ] # filter rows BEFORE DR_data
dd$Y <- DR_data(dd[, 1:3])
kDf1$splits[[x]]$data <<- dd
})

因为我对tidymodels不是很熟悉,接下来使用DirichReg进行建模。然后我又用 tidymodels 做了一遍并比较了它们。 (输出是相同的。)

DirichReg 拟合模型和总结

set.seed(88)
# perform crossfold validation on Dirichlet Model
df2.fit <- map(1:20,
.f = function(x){
Rpt = kDf1$splits[[x]]$id$id
Fld = kDf1$splits[[x]]$id$id2
daf = kDf1$splits[[x]]$data
fit = DirichReg(Y ~ B1 + B2, daf)
list(Rept = Rpt, Fold = Fld, fit = fit)
})
# summary of each fitted model
fit.a <- map(1:20,
.f = function(x){
summary(df2.fit[[x]]$fit)
})

tidymodels 和拟合总结(代码看起来一样,但有一些不同——尽管输出是一样的)

# I'm not sure what 'alternative' is supposed to do here?
dreg_spec <- DREG(param="alternative") %>% # this is not model = alternative
set_engine("DR")

set.seed(88)
dfa.fit <- map(1:20,
.f = function(x){
Rpt = kDf1$splits[[x]]$id$id
Fld = kDf1$splits[[x]]$id$id2
daf = kDf1$splits[[x]]$data
fit = dreg_spec %>%
fit(Y ~ B1 + B2, data = daf)
list(Rept = Rpt, Fold = Fld, fit = fit)
})

afit.a <- map(1:20,
.f = function(x){
summary(dfa.fit[[x]]$fit$fit) # extra nest for parsnip
})

如果你想看第一个模型?

fit.a[[1]]
afit.a[[1]]

如果您想要 AIC 最低的模型?

# comare AIC, BIC, and liklihood?
# what do you percieve best fit with?
fmin = min(unlist(map(1:20, ~fit.a[[.x]]$aic))) # dir

# find min AIC model number
paste0((map(1:20, ~ifelse(fit.a[[.x]]$aic == fmin, .x, ""))), collapse = "")

fit.a[[19]]
afit.a[[19]]

关于r - 在 tidymodels recipes::recipe() 中创建一个多元矩阵,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/71758668/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com