gpt4 book ai didi

r - 为函数体中的 dplyr 参数提供多组变量

转载 作者:行者123 更新时间:2023-12-04 18:20:38 24 4
gpt4 key购买 nike

这是数据:

library(tidyverse)

data <- tibble::tribble(
~var1, ~var2, ~var3, ~var4, ~var5,
"a", "d", "g", "hello", 1L,
"a", "d", "h", "hello", 2L,
"b", "e", "h", "k", 4L,
"b", "e", "h", "k", 7L,
"c", "f", "i", "hello", 3L,
"c", "f", "i", "hello", 4L
)

和向量,我想使用:
filter_var <- c("hello")
groupby_vars1 <- c("var1", "var2", "var3")
groupby_vars2 <- c("var1", "var2")
joinby_vars1 <- c("var1", "var2")
joinby_vars2 <- c("var1", "var2", "var3")

2nd & 5th, 3rd & 4th 向量是相同的,但请假设它们是不同的,并将它们保留为不同的向量。

现在我想创建一个通用函数,我可以在其中获取数据和这些向量以获得结果。
my_fun <- function(data, filter_var, groupby_vars1,groupby_vars2, joinby_vars1, joinby_vars2) {

data2 <- data %>% filter(var4 == filter_var)

data3 <- data2 %>%
group_by(groupby_vars1) %>%
summarise(var6 = sum(var5))

data4 <- data3 %>%
ungroup() %>%
group_by(groupby_vars2) %>%
summarise(avg = mean(var6,na.rm = T))

data5 <- data3 %>% left_join(data4, by = joinby_vars1)

data6 <- data %>% left_join(data5, by = joinby_vars2)
}

问题是向函数提供多个变量的多个向量以用作主体中的 dplyr 参数。我尝试查看 http://dplyr.tidyverse.org/articles/programming.html ,但无法解决上述问题。

最佳答案

group_by不能带groupby_vars...字符串作为输入。您需要使用 rlang::syms()将字符串向量转换为变量然后使用 !!!取消引用它们,以便它们可以在 group_by 内进行评估

library(tidyverse)
library(rlang)

data <- tibble::tribble(
~var1, ~var2, ~var3, ~var4, ~var5,
"a", "d", "g", "hello", 1L,
"a", "d", "h", "hello", 2L,
"b", "e", "h", "k", 4L,
"b", "e", "h", "k", 7L,
"c", "f", "i", "hello", 3L,
"c", "f", "i", "hello", 4L
)

filter_var <- c("hello")
groupby_vars1 <- c("var1", "var2", "var3")
groupby_vars2 <- c("var1", "var2")
joinby_vars1 <- c("var1", "var2")
joinby_vars2 <- c("var1", "var2", "var3")

my_fun <- function(data, filter_var,
groupby_vars1, groupby_vars2,
joinby_vars1, joinby_vars2) {

groupby_vars1 <- syms(groupby_vars1)
groupby_vars2 <- syms(groupby_vars2)

data2 <- data %>%
filter(var4 == filter_var)

data3 <- data2 %>%
group_by(!!! groupby_vars1) %>%
summarise(var6 = sum(var5))

data4 <- data3 %>%
ungroup() %>%
group_by(!!! groupby_vars2) %>%
summarise(avg = mean(var6, na.rm = TRUE))

data5 <- data3 %>%
left_join(data4, by = joinby_vars1)

data6 <- data %>%
left_join(data5, by = joinby_vars2)

return(data6)
}

my_fun(data, filter_var,
groupby_vars1, groupby_vars2,
joinby_vars1, joinby_vars2)

#> # A tibble: 6 x 7
#> var1 var2 var3 var4 var5 var6 avg
#> <chr> <chr> <chr> <chr> <int> <int> <dbl>
#> 1 a d g hello 1 1 1.5
#> 2 a d h hello 2 2 1.5
#> 3 b e h k 4 NA NA
#> 4 b e h k 7 NA NA
#> 5 c f i hello 3 7 7
#> 6 c f i hello 4 7 7

另一种方法:使用 parse_exprs 解析字符串向量然后在函数内部取消引用它们。另见 this

my_fun2 <- function(data, filter_var, 
groupby_vars1, groupby_vars2,
joinby_vars1, joinby_vars2) {

data2 <- data %>%
filter(var4 == filter_var)

data3 <- data2 %>%
group_by(!!! groupby_vars1) %>%
summarise(var6 = sum(var5))

data4 <- data3 %>%
ungroup() %>%
group_by(!!! groupby_vars2) %>%
summarise(avg = mean(var6, na.rm = TRUE))

data5 <- data3 %>%
left_join(data4, by = joinby_vars1)

data6 <- data %>%
left_join(data5, by = joinby_vars2)

return(data6)
}

my_fun2(data, filter_var,
parse_exprs(groupby_vars1), parse_exprs(groupby_vars2),
joinby_vars1, joinby_vars2)

identical(my_fun(data, filter_var,
groupby_vars1, groupby_vars2,
joinby_vars1, joinby_vars2),
my_fun2(data, filter_var,
parse_exprs(groupby_vars1), parse_exprs(groupby_vars2),
joinby_vars1, joinby_vars2))

[1] TRUE

创建于 2018-04-24 由 reprex package (v0.2.0)。

关于r - 为函数体中的 dplyr 参数提供多组变量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50011988/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com