gpt4 book ai didi

r - 建模师 : Fitting multiple models with resampled data

转载 作者:行者123 更新时间:2023-12-04 09:43:28 25 4
gpt4 key购买 nike

tidy model of data science (TM)实现于 modelr ,重新采样的数据使用 list-columns 组织:

library(modelr)
library(tidyverse)

# create the k-folds
df_heights_resampled = heights %>%
crossv_kfold(k = 10, id = "Resample ID")

可以到 map列表列中每个训练数据集的模型 train并通过 map 计算性能指标ping 到列表列 test .

如果需要使用多个模型完成此操作,则需要为每个模型重复此操作。
# create a list of formulas 
formulas_heights = formulas(
.response = ~ income,
model1 = ~ height + weight + marital + sex,
model2 = ~ height + weight + marital + sex + education
)

# fit each of the models in the list of formulas
df_heights_resampled = df_heights_resampled %>%
mutate(
model1 = map(train, function(train_data) {
lm(formulas_heights[[1]], data = train_data)
}),
model2 = map(train, function(train_data) {
lm(formulas_heights[[2]], data = train_data)
})
)

# score the models on the test sets
df_heights_resampled = df_heights_resampled %>%
mutate(
rmse1 = map2_dbl(.x = model1, .y = test, .f = rmse),
rmse2 = map2_dbl(.x = model2, .y = test, .f = rmse)
)

这使:
> df_heights_resampled
# A tibble: 10 × 7
train test `Resample ID` model1 model2 rmse1 rmse2
<list> <list> <chr> <list> <list> <dbl> <dbl>
1 <S3: resample> <S3: resample> 01 <S3: lm> <S3: lm> 58018.35 53903.99
2 <S3: resample> <S3: resample> 02 <S3: lm> <S3: lm> 55117.37 50279.38
3 <S3: resample> <S3: resample> 03 <S3: lm> <S3: lm> 49005.82 44613.93
4 <S3: resample> <S3: resample> 04 <S3: lm> <S3: lm> 55437.07 51068.90
5 <S3: resample> <S3: resample> 05 <S3: lm> <S3: lm> 48845.35 44673.88
6 <S3: resample> <S3: resample> 06 <S3: lm> <S3: lm> 58226.69 54010.50
7 <S3: resample> <S3: resample> 07 <S3: lm> <S3: lm> 56571.93 53322.41
8 <S3: resample> <S3: resample> 08 <S3: lm> <S3: lm> 46084.82 42294.50
9 <S3: resample> <S3: resample> 09 <S3: lm> <S3: lm> 59762.22 54814.55
10 <S3: resample> <S3: resample> 10 <S3: lm> <S3: lm> 45328.48 41882.79

题:

如果要探索的模型数量很大,这会很快变得很麻烦。 modelr提供 fit_with允许迭代多个模型(以多个公式为特征)的函数,但似乎不允许像 train 这样的列表列在上面的模型中。我假设 *map* 之一函数系列将使这成为可能( invoke_map ?),但一直无法弄清楚如何实现。

最佳答案

您可以使用 map 以编程方式构建调用和 lazyeval::interp .我很好奇是否有纯purrr解决方案,但问题是您想要创建多个列,并且需要多次调用。也许是 purrr解决方案将创建另一个包含所有模型的列表列。

library(lazyeval)
model_calls <- map(formulas_heights,
~interp(~map(train, ~lm(form, data = .x)), form = .x))
score_calls <- map(names(model_calls),
~interp(~map2_dbl(.x = m, .y = test, .f = rmse), m = as.name(.x)))
names(score_calls) <- paste0("rmse", seq_along(score_calls))

df_heights_resampled %>% mutate_(.dots = c(model_calls, score_calls))

# A tibble: 10 × 7
train test `Resample ID` model1 model2 rmse1 rmse2
<list> <list> <chr> <list> <list> <dbl> <dbl>
1 <S3: resample> <S3: resample> 01 <S3: lm> <S3: lm> 44720.86 41452.07
2 <S3: resample> <S3: resample> 02 <S3: lm> <S3: lm> 54174.38 48823.03
3 <S3: resample> <S3: resample> 03 <S3: lm> <S3: lm> 56854.21 52725.62
4 <S3: resample> <S3: resample> 04 <S3: lm> <S3: lm> 53312.38 48797.48
5 <S3: resample> <S3: resample> 05 <S3: lm> <S3: lm> 61883.90 57469.17
6 <S3: resample> <S3: resample> 06 <S3: lm> <S3: lm> 55709.83 50867.26
7 <S3: resample> <S3: resample> 07 <S3: lm> <S3: lm> 53036.06 48698.07
8 <S3: resample> <S3: resample> 08 <S3: lm> <S3: lm> 55986.83 52717.94
9 <S3: resample> <S3: resample> 09 <S3: lm> <S3: lm> 51738.60 48006.74
10 <S3: resample> <S3: resample> 10 <S3: lm> <S3: lm> 45061.22 41480.35

关于r - 建模师 : Fitting multiple models with resampled data,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40204405/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com