gpt4 book ai didi

r - dplyr-0.6.0 编程取消引用

转载 作者:行者123 更新时间:2023-12-03 02:27:47 26 4
gpt4 key购买 nike

我正在尝试编写一个简单的包装器 summarise()任意组的任意变量并且已经取得了进展,现在我得到了 correct library version loaded但我(再次)对如何取消引用具有多个值的参数感到困惑。

我目前有以下功能...

table_summary <- function(df     = .,
id = individual_id,
select = c(),
group = site,
...){
## Quote all arguments (see http://dplyr.tidyverse.org/articles/programming.html)
quo_id <- enquo(id)
quo_select <- enquo(select)
quo_group <- enquo(group)
## Subset the data
df <- df %>%
dplyr::select(!!quo_id, !!quo_select, !!quo_group) %>%
unique()
## gather() data, just in case there is > 1 variable selected to be summarised
df <- df %>%
gather(key = variable, value = value, !!quo_select)
## Summarise selected variables by specified groups
results <- df %>%
group_by(!!quo_group, variable) %>%
summarise(n = n(),
mean = mean(value, na.rm = TRUE))
return(results)
}

如果我指定一个分组变量,它就可以实现大部分功能......

> table_summary(df = mtcars, id = model, select = c(mpg), group = gear)
# A tibble: 3 x 4
# Groups: c(gear) [?]
gear variable n mean
<dbl> <chr> <int> <dbl>
1 3 mpg 15 16.10667
2 4 mpg 12 24.53333
3 5 mpg 5 21.38000

...但在 group_by(!!quo_group, variable) 处失败当我指定多个group = c(gear, hp)时...

> mtcars$model <- rownames(mtcars)
> table_summary(df = mtcars, id = model, select = c(mpg), group = c(gear, hp))
Error in mutate_impl(.data, dots) :
Column `c(gear, hp)` must be length 32 (the group size) or one, not 64

我回去重新阅读了 programming dplyr documentation我读到你可以 capture multiple variables使用quos()而不是enquo()然后unquote-splice them with !!! ,所以尝试过...

table_summary <- function(df     = .,
id = individual_id,
select = c(),
group = c(),
digits = 3,
...){
## Quote all arguments (see http://dplyr.tidyverse.org/articles/programming.html)
quo_id <- enquo(id)
quo_select <- enquo(select)
quo_group <- quos(group) ## Use quos() rather than enquo()
UQS(quo_group) %>% print() ## Check to see what quo_group holds
## Subset the data
df <- df %>%
dplyr::select(!!quo_id, !!quo_select, !!!quo_group)) %>%
unique()
## gather() data, just in case there is > 1 variable selected to be summarised
df <- df %>%
gather(key = variable, value = value, !!quo_select)
## Summarise selected variables by specified groups
results <- df %>%
group_by(!!!quo_group, variable) %>%
summarise(n = n(),
mean = mean(value, na.rm = TRUE))
return(results)
}

...现在在第一次引用 !!!quo_group``within 时失败dplyr::select() regardless of how many variables are specified under组=`...

> table_summary(df = mtcars, id = model, select = c(mpg), group = c(gear))
[[1]]
<quosure: frame>
~group

attr(,"class")
[1] "quosures"
Error in overscope_eval_next(overscope, expr) : object 'gear' not found
> traceback()
17: .Call(rlang_eval, f_rhs(quo), overscope)
16: overscope_eval_next(overscope, expr)
15: FUN(X[[i]], ...)
14: lapply(.x, .f, ...)
13: map(.x[matches], .f, ...)
12: map_if(ind_list, !is_helper, eval_tidy, data = names_list)
11: select_vars(names(.data), !(!(!quos(...))))
10: select.data.frame(., !(!quo_id), !(!quo_select), !(!(!quo_group)))
9: dplyr::select(., !(!quo_id), !(!quo_select), !(!(!quo_group)))
8: function_list[[i]](value)
7: freduce(value, `_function_list`)
6: `_fseq`(`_lhs`)
5: eval(quote(`_fseq`(`_lhs`)), env, env)
4: eval(quote(`_fseq`(`_lhs`)), env, env)
3: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
2: df %>% dplyr::select(!(!quo_id), !(!quo_select), !(!(!quo_group))) %>%
unique()
1: table_summary(df = mtcars, id = model, select = c(mpg), group = c(gear))

看起来很奇怪,我认为问题的根源在于 !!!quo_group (即 UQS(quo_group) )打印出 ~gear而不是添加 print() 那样的限制列表进入工作示例显示发生的情况...

> my_summarise <- function(df, ...) {
group_by <- quos(...)
UQS(group_by) %>% print()
df %>%
group_by(!!!group_by) %>%
summarise(a = mean(a))
}
> df <- tibble(
g1 = c(1, 1, 2, 2, 2),
g2 = c(1, 2, 1, 2, 1),
a = sample(5),
b = sample(5)
)
> my_summarise(df, g1, g2)
[[1]]
<quosure: global>
~g1

[[2]]
<quosure: global>
~g2

attr(,"class")
[1] "quosures"
# A tibble: 4 x 3
# Groups: g1 [?]
g1 g2 a
<dbl> <dbl> <dbl>
1 1 1 1.0
2 1 2 5.0
3 2 1 2.5
4 2 2 4.0

我想明确提供我希望分组的变量作为我的参数的参数,但如果我将它们指定为 ... ,它会起作用吗?但我决定测试我的函数在提供分组变量为 ... 时是否有效

table_summary <- function(df     = .,
id = individual_id,
select = c(),
group = c(),
digits = 3,
...){
## Quote all arguments (see http://dplyr.tidyverse.org/articles/programming.html)
quo_id <- enquo(id)
quo_select <- enquo(select)
## quo_group <- quos(group)
quo_group <- quos(...)
UQS(quo_group) %>% print()
## Subset the data
df <- df %>%
dplyr::select(!!quo_id, !!quo_select, !!!quo_group) %>%
unique()
## gather() data, just in case there is > 1 variable selected to be summarised
df <- df %>%
gather(key = variable, value = value, !!quo_select)
## Summarise selected variables by specified groups
results <- df %>%
group_by(!!!quo_group, variable) %>%
summarise(n = n(),
mean = mean(value, na.rm = TRUE))
return(results)
}

...但事实并非如此,quos()再次取消引号拼接到 NULL所以变量既没有被选择也没有按...分组

> table_summary(df = mtcars, id = model, select = c(mpg), gear, hp)
NULL
# A tibble: 1 x 3
variable n mean
<chr> <int> <dbl>
1 mpg 32 20.09062
> table_summary(df = mtcars, id = model, select = c(mpg), gear)
NULL
# A tibble: 1 x 3
variable n mean
<chr> <int> <dbl>
1 mpg 32 20.09062

我已经经历过这个循环好几次了,现在正在检查使用 enquo() 的每种方法。和quos()但看不出我哪里出了问题,尽管已经阅读了编程 dplyr 文档多次。

最佳答案

IIUC 您的帖子,您想要将 c(col1, col2) 提供给 group_by()。该动词不支持这一点:

group_by(mtcars, c(cyl, am))
#> Error in mutate_impl(.data, dots) :
#> Column `c(cyl, am)` must be length 32 (the number of rows) or one, not 64

这是因为 group_by() 具有 mutate 语义,而不是 select 语义。这意味着您提供给 group_by() 的表达式是转换表达式。这是一个令人惊讶但非常方便的功能。例如,您可以按 disp 分为三个间隔进行分组,如下所示:

group_by(mtcars, cut3 = cut(disp, 3))

这也意味着,如果您提供 c(cyl, am),它会将两列连接在一起并返回长度为 64 的向量,而它预期的长度为 32(数字行)。

所以你的问题是你想要一个具有选择语义的 group_by() 包装器。使用 dplyr::select_vars() 很容易做到这一点,它将很快被提取到新的 tidyselect 包中:

library("dplyr")

group_wrapper <- function(df, groups = rlang::chr()) {
groups <- select_vars(tbl_vars(df), !! enquo(groups))
group_by(df, !!! rlang::syms(groups))
}

或者,您可以包装新的 group_by_at() 动词,它确实具有选择语义:

group_wrapper <- function(df, groups = rlang::chr()) {
group_by_at(df, vars(!! enquo(groups)))
}

让我们尝试一下:

group_wrapper(mtcars, c(disp, am))
#> # A tibble: 32 x 11
#> # Groups: disp, am [27]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21.0 6 160 110 3.90 2.62 16.5 0 1 4 4
#> # ... with 22 more rows

此接口(interface)的优点是支持所有 select() 操作来选择要分组的列。

请注意,我使用 rlang::chr() 作为默认参数,因为 c() 返回不受支持的 NULL通过选择功能(我们将来可能想改变它)。不带参数调用的 chr() 返回长度为 0 的字符向量。

关于r - dplyr-0.6.0 编程取消引用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44202692/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com