r - dplyr-0.6.0 编程取消引用-6ren

r - dplyr-0.6.0 编程取消引用

转载作者：行者123 更新时间：2023-12-03 02:27:47

我正在尝试编写一个简单的包装器 summarise()任意组的任意变量并且已经取得了进展，现在我得到了 correct library version loaded但我(再次)对如何取消引用具有多个值的参数感到困惑。

我目前有以下功能...

table_summary <- function(df     = .,
                          id     = individual_id,
                          select = c(),
                          group  = site,
                          ...){
    ## Quote all arguments (see http://dplyr.tidyverse.org/articles/programming.html)
    quo_id     <- enquo(id)
    quo_select <- enquo(select)
    quo_group  <- enquo(group)
    ## Subset the data
    df <- df %>%
          dplyr::select(!!quo_id, !!quo_select, !!quo_group) %>%
          unique()
    ## gather() data, just in case there is > 1 variable selected to be summarised
    df <- df %>%
          gather(key = variable, value = value, !!quo_select)
    ## Summarise selected variables by specified groups
    results <- df %>%
           group_by(!!quo_group, variable) %>%
           summarise(n    = n(),
                     mean = mean(value, na.rm = TRUE))
    return(results)
}

如果我指定一个分组变量，它就可以实现大部分功能......

> table_summary(df = mtcars, id = model, select = c(mpg), group = gear)
# A tibble: 3 x 4
# Groups:   c(gear) [?]
       gear variable     n     mean
      <dbl>    <chr> <int>    <dbl>
1         3      mpg    15 16.10667
2         4      mpg    12 24.53333
3         5      mpg     5 21.38000

...但在 group_by(!!quo_group, variable) 处失败当我指定多个group = c(gear, hp)时...

> mtcars$model <- rownames(mtcars)
> table_summary(df = mtcars, id = model, select = c(mpg), group = c(gear, hp))
Error in mutate_impl(.data, dots) : 
  Column `c(gear, hp)` must be length 32 (the group size) or one, not 64

我回去重新阅读了 programming dplyr documentation我读到你可以 capture multiple variables使用quos()而不是enquo()然后unquote-splice them with !!! ，所以尝试过...

table_summary <- function(df     = .,
                          id     = individual_id,
                          select = c(),
                          group  = c(),
                          digits = 3,
                          ...){
    ## Quote all arguments (see http://dplyr.tidyverse.org/articles/programming.html)
    quo_id     <- enquo(id)
    quo_select <- enquo(select)
    quo_group  <- quos(group)  ## Use quos() rather than enquo()
    UQS(quo_group) %>% print() ## Check to see what quo_group holds
    ## Subset the data
    df <- df %>%
          dplyr::select(!!quo_id, !!quo_select, !!!quo_group)) %>%
          unique()
    ## gather() data, just in case there is > 1 variable selected to be summarised
    df <- df %>%
          gather(key = variable, value = value, !!quo_select)
    ## Summarise selected variables by specified groups
    results <- df %>%
               group_by(!!!quo_group, variable) %>%
               summarise(n    = n(),
                         mean = mean(value, na.rm = TRUE))
    return(results)
}

...现在在第一次引用 !!!quo_group``within 时失败dplyr::select() regardless of how many variables are specified under组=`...

> table_summary(df = mtcars, id = model, select = c(mpg), group = c(gear))
[[1]]
<quosure: frame>
~group

attr(,"class")
[1] "quosures"
Error in overscope_eval_next(overscope, expr) : object 'gear' not found
> traceback()
17: .Call(rlang_eval, f_rhs(quo), overscope)
16: overscope_eval_next(overscope, expr)
15: FUN(X[[i]], ...)
14: lapply(.x, .f, ...)
13: map(.x[matches], .f, ...)
12: map_if(ind_list, !is_helper, eval_tidy, data = names_list)
11: select_vars(names(.data), !(!(!quos(...))))
10: select.data.frame(., !(!quo_id), !(!quo_select), !(!(!quo_group)))
9: dplyr::select(., !(!quo_id), !(!quo_select), !(!(!quo_group)))
8: function_list[[i]](value)
7: freduce(value, `_function_list`)
6: `_fseq`(`_lhs`)
5: eval(quote(`_fseq`(`_lhs`)), env, env)
4: eval(quote(`_fseq`(`_lhs`)), env, env)
3: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
2: df %>% dplyr::select(!(!quo_id), !(!quo_select), !(!(!quo_group))) %>% 
       unique()
1: table_summary(df = mtcars, id = model, select = c(mpg), group = c(gear))

看起来很奇怪，我认为问题的根源在于 !!!quo_group (即 UQS(quo_group) )打印出 ~gear而不是添加 print() 那样的限制列表进入工作示例显示发生的情况...

> my_summarise <- function(df, ...) {
    group_by <- quos(...)
    UQS(group_by) %>% print()
    df %>%
    group_by(!!!group_by) %>%
    summarise(a = mean(a))
  }
> df <- tibble(
    g1 = c(1, 1, 2, 2, 2),
    g2 = c(1, 2, 1, 2, 1),
    a = sample(5), 
    b = sample(5)
  )
> my_summarise(df, g1, g2)
[[1]]
<quosure: global>
~g1

[[2]]
<quosure: global>
~g2

attr(,"class")
[1] "quosures"
# A tibble: 4 x 3
# Groups:   g1 [?]
     g1    g2     a
  <dbl> <dbl> <dbl>
1     1     1   1.0
2     1     2   5.0
3     2     1   2.5
4     2     2   4.0

我想明确提供我希望分组的变量作为我的参数的参数，但如果我将它们指定为 ... ，它会起作用吗？但我决定测试我的函数在提供分组变量为 ... 时是否有效

table_summary <- function(df     = .,
                          id     = individual_id,
                          select = c(),
                          group  = c(),
                          digits = 3,
                          ...){
    ## Quote all arguments (see http://dplyr.tidyverse.org/articles/programming.html)
    quo_id     <- enquo(id)
    quo_select <- enquo(select)
    ## quo_group  <- quos(group)
    quo_group  <- quos(...)
    UQS(quo_group) %>% print()
    ## Subset the data
    df <- df %>%
          dplyr::select(!!quo_id, !!quo_select, !!!quo_group) %>%
          unique()
    ## gather() data, just in case there is > 1 variable selected to be summarised
    df <- df %>%
          gather(key = variable, value = value, !!quo_select)
    ## Summarise selected variables by specified groups
    results <- df %>%
               group_by(!!!quo_group, variable) %>%
               summarise(n    = n(),
                         mean = mean(value, na.rm = TRUE))
    return(results)
}

...但事实并非如此，quos()再次取消引号拼接到 NULL所以变量既没有被选择也没有按...分组

> table_summary(df = mtcars, id = model, select = c(mpg), gear, hp)
NULL
# A tibble: 1 x 3
  variable     n     mean
     <chr> <int>    <dbl>
1      mpg    32 20.09062
> table_summary(df = mtcars, id = model, select = c(mpg), gear)
NULL
# A tibble: 1 x 3
  variable     n     mean
     <chr> <int>    <dbl>
1      mpg    32 20.09062

我已经经历过这个循环好几次了，现在正在检查使用 enquo() 的每种方法。和quos()但看不出我哪里出了问题，尽管已经阅读了编程 dplyr 文档多次。

最佳答案

IIUC 您的帖子，您想要将 c(col1, col2) 提供给 group_by()。该动词不支持这一点:

group_by(mtcars, c(cyl, am))
#> Error in mutate_impl(.data, dots) :
#>   Column `c(cyl, am)` must be length 32 (the number of rows) or one, not 64

这是因为 group_by() 具有 mutate 语义，而不是 select 语义。这意味着您提供给 group_by() 的表达式是转换表达式。这是一个令人惊讶但非常方便的功能。例如，您可以按 disp 分为三个间隔进行分组，如下所示:

group_by(mtcars, cut3 = cut(disp, 3))

这也意味着，如果您提供 c(cyl, am)，它会将两列连接在一起并返回长度为 64 的向量，而它预期的长度为 32(数字行)。

所以你的问题是你想要一个具有选择语义的 group_by() 包装器。使用 dplyr::select_vars() 很容易做到这一点，它将很快被提取到新的 tidyselect 包中:

library("dplyr")

group_wrapper <- function(df, groups = rlang::chr()) {
  groups <- select_vars(tbl_vars(df), !! enquo(groups))
  group_by(df, !!! rlang::syms(groups))
}

或者，您可以包装新的 group_by_at() 动词，它确实具有选择语义:

group_wrapper <- function(df, groups = rlang::chr()) {
  group_by_at(df, vars(!! enquo(groups)))
}

让我们尝试一下:

group_wrapper(mtcars, c(disp, am))
#> # A tibble: 32 x 11
#> # Groups:   disp, am [27]
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21.0     6   160   110  3.90  2.62  16.5     0     1     4     4
#> # ... with 22 more rows

此接口(interface)的优点是支持所有 select() 操作来选择要分组的列。

请注意，我使用 rlang::chr() 作为默认参数，因为 c() 返回不受支持的 NULL通过选择功能(我们将来可能想改变它)。不带参数调用的 chr() 返回长度为 0 的字符向量。

关于r - dplyr-0.6.0 编程取消引用，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/44202692/

文章推荐： wpf - 我们如何提高 WPF 动画(共 100 个)的性能？

文章推荐：检查空字符串的 SQL 语句 - T-SQL

文章推荐： asp.net-mvc - 从 Controller 操作返回 301 重定向

r - dplyr::rename_all & dplyr::if_else
我有以下数据框: library(dplyr) df % rename_all(funs(stringr::str_replace_all(., "gh", "v"))) 我想结合使用 renam
r - dplyr::rename_all & dplyr::if_else
我有以下数据框: library(dplyr) df % rename_all(funs(stringr::str_replace_all(., "gh", "v"))) 我想结合使用 renam
r - 使用 dplyr::across 执行 dplyr::select
我有一个数据( df_1 ): df_1 % select_at(.vars = 'var_1') var_1 1 99.47262 10 25.91552 没关系。但: df_1
r - 库(dplyr): there is no package called ‘dplyr’ 中的错误
我正在尝试安装dplyr软件包，但收到一条错误消息，提示“库(dplyr)中存在错误:没有名为dplyr的软件包”。我正在使用窗口系统和Ri386 3.5.2。我尝试按照其他人的建议使用代码insta
R、dplyr 和雪 : how to parallelize functions which use dplyr
假设我想以并行方式申请 myfunction到 myDataFrame 的每一行.假设 otherDataFrame是一个包含两列的数据框:COLUNM1_odf和 COLUMN2_odf出于某些原因
r - 从 dplyr 使用 %>% 运算符而不在 R 中加载 dplyr
我目前正在构建一个包，我想知道是否有办法调用 %>%来自 dplyr 的操作符，而无需实际附加 dplyr 包。例如，对于从包中导出的任何函数，您可以使用双冒号 ( :: ) 调用它。所以如果我想使用
r - 如何使用 dplyr 内的函数或表达式对 dplyr 内的公式调用创建的 t.test 模型的属性进行变异？
library(dplyr) mtcars %>% group_by(vs) %>% do(tt=t.test(mpg~am, data=.)) %>% mutate(t=tt$statist
r - 尝试使用 dplyr::do 在内部构建模型，然后在同一个 dplyr::do 调用中拉取 coef(model)
我正在尝试为一组标准曲线构建一系列线性模型。目前这段代码正在产生我想要的输出(每个线性模型的截距和斜率): slopes % group_by(plate, col, row, conc_ug_mL
用 dplyr::group_split 和 purrr::map_df 替换 dplyr::do 函数
我正在寻找替换我的一些使用 dplyr::do 的 R 代码，因为这个函数很快就会被弃用。我的很多工作都需要创建分层 CDF 图。使用 dply:do 时，我分层的变量作为变量传递给结果数据框，然后我
r - 在 `dplyr::case_when()` 的上下文中使用 `dplyr::mutate()` 中的复杂 RHS 表达式
问题我正在尝试使用 dplyr::mutate()和 dplyr::case_when()在数据框中创建新的数据列，该列使用存储在另一个对象(“查找列表”)中的数据填充，并基于数据框中列中的信息。
r - 无法在 dplyr.spark.hive 包中创建由 SparkSQL 支持的 dplyr src
最近我发现了很棒的 dplyr.spark.hive启用 dplyr 的软件包前端操作 spark或 hive后端。在包的 README 中有关于如何安装此包的信息: options(repos =
r - 你能在 dplyr 链中使用 data.frame 两次吗？ dplyr 说 "Error: cannot handle "
我正在尝试在 dplyr 链中使用 data.frame 两次。这是一个给出错误的简单示例 df % group_by(Type) %>% summarize(X=n()) %>% mu
r - data.table 后端的 dplyr 错误 [在 dplyr 0.4.3 或更早版本中]
当我浏览答案时 here , 我找到了 this solution与 data.frame 完全符合预期. library(dplyr) # dplyr_0.4.3 library(data.tab
使用通用名称对列重新排序 - dplyr
我的数据来自一个数据库，根据我运行 SQL 查询的时间，该数据库可能包含一周到另一周不同的 POS 值。不知道哪些值将在变量中使得自动创建报告变得非常困难。我的数据如下所示: sample % p
dplyr 中的回归输出
我想定义与“扫帚”包中类似的功能 library(dplyr) library(broom) mtcars %>% group_by(am) %>% do(model = lm(mpg ~ w
dplyr 中的滚动总和
set.seed(123) df % group_by(id) %>% mutate(roll.sum = c(x[1:4], zoo::rollapply(x, 5, sum))) # Groups
dplyr 中带条件的递归函数
先来个样本数据 set.seed(123) dat 1 -4 2 6 3 -2 4
按组排序变量 (dplyr)
我有一个带列的数据框 x1, x2, group我想生成一个带有额外列的新数据框 rank表示x1的顺序在其组中。有相关问题here ，但已接受的答案似乎不再有效。到这里为止，很好: librar
dplyr 中的排名函数
我有一个示例 df，如下所示: d% group_by(CaseNo) %>% arrange(desc(Submissiondate)) %>% dplyr::mutate(rank = row_n
用最常见的值替换数据输入错误 - dplyr
我有一个数据框，其中包含一些数据输入错误。我希望将每组的这些异常值替换为每组最常见的值。我的数据如下: df % group_by(CODE) %>% mutate(across(c(DOSAGE

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

r - dplyr-0.6.0 编程取消引用