r - 为什么 tidymodels/recipes 中的 "id variable"会起到预测作用？-6ren

r - 为什么 tidymodels/recipes 中的 "id variable"会起到预测作用？

转载作者：行者123 更新时间：2023-12-05 08:31:12

这与 Predict with step_naomit and retain ID using tidymodels 是同一个问题，但即使有一个可接受的答案，OP 的最后评论指出了“id 变量”被用作预测变量的问题，正如在查看 model$fit$variable.importance 时所见.

我有一个包含“id 变量”的数据集，我想保留。我认为我可以通过 recipe() 规范实现这一点。

library(tidymodels)

# label is an identifier variable I want to keep even though it's not
# a predictor
df <- tibble(label = 1:50, 
             x = rnorm(50, 0, 5), 
             f = factor(sample(c('a', 'b', 'c'), 50, replace = TRUE)),
             y = factor(sample(c('Y', 'N'), 50, replace = TRUE)) )

df_split <- initial_split(df, prop = 0.70)

# Make up any recipe: just note I specify 'label' as "id variable"
rec <- recipe(training(df_split)) %>% 
  update_role(label, new_role = "id variable") %>% 
  update_role(y, new_role = "outcome") %>% 
  update_role(x, new_role = "predictor") %>% 
  update_role(f, new_role = "predictor") %>% 
  step_corr(all_numeric(), -all_outcomes()) %>%
  step_dummy(all_predictors(),-all_numeric()) %>% 
  step_meanimpute(all_numeric(), -all_outcomes())

train_juiced <- prep(rec, training(df_split)) %>% juice()

logit_fit <- logistic_reg(mode = "classification") %>%
  set_engine(engine = "glm") %>% 
  fit(y ~ ., data = train_juiced)

# Why is label a variable in the model ?
logit_fit[['fit']][['coefficients']]
#> (Intercept)       label           x         f_b         f_c 
#>  1.03664140 -0.01405316  0.22357266 -1.80701531 -1.66285399

^{由 reprex package 创建于 2020-01-27 (v0.3.0)}

但即使我确实指定 label 是一个 id 变量，它仍被用作预测变量。所以也许我可以在公式中使用我想要的特定术语，并专门添加 label 作为 id 变量。

rec <- recipe(training(df_split), y ~ x + f) %>% 
  update_role(label, new_role = "id variable") %>% 
  step_corr(all_numeric(), -all_outcomes()) %>%
  step_dummy(all_predictors(),-all_numeric()) %>% 
  step_meanimpute(all_numeric(), -all_outcomes())
#> Error in .f(.x[[i]], ...): object 'label' not found

^{由 reprex package 创建于 2020-01-27 (v0.3.0)}

我可以尝试不提及 label

rec <- recipe(training(df_split), y ~ x + f) %>% 
  step_corr(all_numeric(), -all_outcomes()) %>%
  step_dummy(all_predictors(),-all_numeric()) %>% 
  step_meanimpute(all_numeric(), -all_outcomes())


train_juiced <- prep(rec, training(df_split)) %>% juice()

logit_fit <- logistic_reg(mode = "classification") %>%
  set_engine(engine = "glm") %>% 
  fit(y ~ ., data = train_juiced)

# Why is label a variable in the model ?
logit_fit[['fit']][['coefficients']]
#> (Intercept)           x         f_b         f_c 
#> -0.98950228  0.03734093  0.98945339  1.27014824

train_juiced
#> # A tibble: 35 x 4
#>          x y       f_b   f_c
#>      <dbl> <fct> <dbl> <dbl>
#>  1 -0.928  Y         1     0
#>  2  4.54   N         0     0
#>  3 -1.14   N         1     0
#>  4 -5.19   N         1     0
#>  5 -4.79   N         0     0
#>  6 -6.00   N         0     0
#>  7  3.83   N         0     1
#>  8 -8.66   Y         1     0
#>  9 -0.0849 Y         1     0
#> 10 -3.57   Y         0     1
#> # ... with 25 more rows

^{由 reprex package 创建于 2020-01-27 (v0.3.0)}

好的，模型可以用了，但是我的标签不见了。
我应该怎么做？

最佳答案

您遇到的主要问题/概念性问题是，一旦您juice() 配方，它就只是数据，即字面上只是一个数据框。当您使用它来拟合模型时，模型无法知道某些变量具有特殊作用。

library(tidymodels)

# label is an identifier variable to keep even though it's not a predictor
df <- tibble(label = 1:50, 
             x = rnorm(50, 0, 5), 
             f = factor(sample(c('a', 'b', 'c'), 50, replace = TRUE)),
             y = factor(sample(c('Y', 'N'), 50, replace = TRUE)) )

df_split <- initial_split(df, prop = 0.70)

rec <- recipe(y ~ ., training(df_split)) %>% 
  update_role(label, new_role = "id variable") %>% 
  step_corr(all_numeric(), -all_outcomes()) %>%
  step_dummy(all_predictors(),-all_numeric()) %>% 
  step_meanimpute(all_numeric(), -all_outcomes()) %>%
  prep()

train_juiced <- juice(rec)
train_juiced
#> # A tibble: 35 x 5
#>    label     x y       f_b   f_c
#>    <int> <dbl> <fct> <dbl> <dbl>
#>  1     1  1.80 N         1     0
#>  2     3  1.45 N         0     0
#>  3     5 -5.00 N         0     0
#>  4     6 -4.15 N         1     0
#>  5     7  1.37 Y         0     1
#>  6     8  1.62 Y         0     1
#>  7    10 -1.77 Y         1     0
#>  8    11 -3.15 N         0     1
#>  9    12 -2.02 Y         0     1
#> 10    13  2.65 Y         0     1
#> # … with 25 more rows

请注意，train_juiced 只是一个普通的小标题。如果您使用 fit() 在这个 tibble 上训练一个模型，它不会知道任何关于用于转换数据的方法。

tidymodels 框架确实有一种方法可以使用配方中的角色信息来训练模型。可能最简单的方法是使用 workflows .

logit_spec <- logistic_reg(mode = "classification") %>%
  set_engine(engine = "glm") 

wf <- workflow() %>%
  add_model(logit_spec) %>%
  add_recipe(rec)

logit_fit <- fit(wf, training(df_split))

# No more label in the model
logit_fit
#> ══ Workflow [trained] ══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
#> Preprocessor: Recipe
#> Model: logistic_reg()
#> 
#> ── Preprocessor ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
#> 3 Recipe Steps
#> 
#> ● step_corr()
#> ● step_dummy()
#> ● step_meanimpute()
#> 
#> ── Model ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
#> 
#> Call:  stats::glm(formula = formula, family = stats::binomial, data = data)
#> 
#> Coefficients:
#> (Intercept)            x          f_b          f_c  
#>     0.42331     -0.04234     -0.04991      0.64728  
#> 
#> Degrees of Freedom: 34 Total (i.e. Null);  31 Residual
#> Null Deviance:       45 
#> Residual Deviance: 44.41     AIC: 52.41

^{由 reprex package 创建于 2020-02-15 (v0.3.0)}

模型中没有更多标签!

关于r - 为什么 tidymodels/recipes 中的 "id variable"会起到预测作用？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/59941616/

文章推荐： GraphQL 端点在 Nest.js 中返回空对象

文章推荐： c - 为枚举变量分配一个超出枚举范围的值时是否有警告？

文章推荐： python - 安装 IPOPT 求解器以在 Windows 中与 pyomo 一起使用

文章推荐： r - 在 ggplot 中过滤管道 df

linux - Recipe 上的自定义 Recipe
我正在使用来自 chef 的软件包开发自定义 Recipe 。我在 recipes 文件夹下创建了一个名为 apache.rb 的文件。然后我通过 berks 上传了菜谱，并在一个节点上用 rec
Is there a way to combine Openrewrite yaml recipes and custom recipes into a single recipe for execution?(有没有一种方法可以将Openrewrite yaml配方和自定义配方组合成一个单独的配方来执行？)
I like to create a single Openrewrite migration jar that includes custom recipes and rewrite.yml
r - 在 tidymodels recipes::recipe() 中创建一个多元矩阵
我正在尝试对一个模型进行 k 折交叉验证，该模型根据卫星图像预测树种断面积比例的联合分布。这需要使用 DiricihletReg::DirichReg() 函数，这反过来又需要使用 Dirichlet
chef-infra - 如何在 Recipe Recipe 中使用库模块
在一本 Recipe 中，我有一个图书馆( client_helper.rb )。在其中定义了一个模块。模块名称为客户 helper .这是模块代码。 module Client_helper #
ruby - 是否可以在不运行默认 Recipe 的情况下运行 Chef Recipe
我的 nginx Recipe 中有两个 Chef Recipe 。一个名为 default.rb，另一个名为 sites.rb。当我用运行网站 Recipe 时 RUN_LIST=recipe[n
ruby - Chef - 使用同一本 Recipe 中的 Recipe
我有 Recipe base 和 Recipe myapp base 有 2 个配方 - my_java 和 java_with_custom_stuff 在 java_with_custom_stu
ruby - 让 Chef Recipe Recipe 只运行一次
所以我使用以下配方: include_recipe "build-essential" node_packages = value_for_platform( [ "debian", "ubunt
ruby - apt Recipe 不会安装在我的 Recipe 中
我正在尝试使用 Vagrant 创建我的第一个 Chef Recipe ，但在第一步就遇到了问题。我的 Recipe 的第一行是: include_recipe "apt" 但是当我尝试 vagran
ubuntu - chef-jira Recipe - 未找到 Recipe apache2
我已经下载了 Recipe “chef-jira”，现在我正在尝试在 Ubuntu 12.04 上使用 chef-solo/vagrant 运行它。经过大量的谷歌搜索和 stackoverflow
chef-infra - Chef 服务器，安装不是来自 Recipe 网站的 Recipe
我们有一个测试环境，由一台服务器、一台客户端和另一台作为工作站的客户端组成。我知道命令 # knife cookbook site install apache2 但是这个命令会产生错误，因为我什至
chef-infra - 使用小 Knife 在 Recipe 中创建新 Recipe
我使用knife solo 使用.chef/knife.rb 定义的自定义模板创建了一个新文件夹和一个带有默认食谱的食谱。我的问题是我应该使用什么命令让刀使用该模板而不是使用 cp 创建新食谱？类似
chef-infra - Chef Recipe - 将 Recipe 中的文件/默认位置的完整目录复制到新位置
我是 Chef 的初学者。任何人都可以告诉我是否有一种方法可以将 Cookbook 的 files/default 目录中的目录复制到其他位置。例如我在 files/ 目录中有一个包含文件 a.tx
chef-infra - Chef 覆盖同一 Recipe 中另一个 Recipe 的属性
我有一本包含 2 个食谱的食谱。属性/default.rb default['vpn']['crt'] = 'nocrt' 默认配方具有创建通用 crt 文件的文件资源 file 'cert' do
ruby - 使用多个 Chef Recipe 编写文件，所有这些 Recipe 都针对同一个文件
我的情况是我有三本 Recipe ，每本都有一个写入/etc/hosts 文件的模板资源。与其覆盖，我想附加: 第一本 Recipe 创建/etc/hosts 文件并写入第 1、2、3 行。第二本
chef-infra - 如何在 RightScale Chef Recipe 中获取正在运行的 Recipe 的完整路径？
从我当前正在执行的食谱中，我想访问它在我的执行机器上的“当前”位置。我需要它来访问它的缓存目录结构。我有一种感觉，它位于“node[]”内的某个地方，但我根本找不到有关其结构的任何文档。有什么建
python - z3c.recipe.scripts 和 zc.recipe.egg 有什么区别？
z3c.recipe.scripts 和 zc.recipe.egg 似乎都在积极开发中。 z3c.recipe.scripts 描述为: The script recipe installs egg
django - 模型妈妈 : Multiple recipes with foreign key relation to a single recipe
我对 ModelMommy 有一段时间的烦恼，但我不知道如何正确地做到这一点。让我们假设一个简单的关系: class Organization(models.Model): label =
chef-recipe - 上传 Recipe 时出现 413 Request Entity Too Large 错误
我将一个 zip 文件放入 Recipe 中，然后将其上传到服务器后，我发现了这个错误。有人可以建议我解决此错误的正确解决方案吗... ERROR: Failed to upload ~/chef-r
php - Chef PHP Recipe RecipeNotFound : could not find recipe client for cookbook mysql
我第一次使用 Chef，试图了解 Recipe 和 Recipe 。我在配置我目前在我的角色文件中执行的标准 php 说明书后收到错误“mysql::client”失败: run_list( "
python - 在 zc.recipe.egg 从入口点生成的脚本中使用 zc.recipe.cmmi 构建的可执行文件
Glpk 需要使用configure make install 命令构建。所以我使用 zc.recipe.cmmi 配方来构建 glpk 包。它在 bin 目录中生成 glpsol 命令。我需要能够在

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

r - 为什么 tidymodels/recipes 中的 "id variable"会起到预测作用？