gpt4 book ai didi

r - 如何使用 dplyr 来总结两个函数

转载 作者:行者123 更新时间:2023-12-02 18:21:07 25 4
gpt4 key购买 nike

我尝试将此数据集作为示例进行总结,并尝试使用多个函数 n()mean()。如何将两者结合在同一个工作流程中?

这是一个反射(reflect)我的较大数据的玩具数据集:

library(tidyverse)
df <- structure(list(group_var = c(70, 72, 73, 70, 70, 71, 70, 71,
71, 70), var1_scr = c(50.5, 25.75, 50.5, 50.5, 50.5, 50.5, 75.25,
75.25, 50.5, 75.25), var2_scr = c(50.5, 50.5, NA, 75.25, 50.5,
50.5, 75.25, 75.25, 100, 75.25), var3_scr = c(NA, NA, 75.25,
NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
df
#> # A tibble: 10 x 4
#> group_var var1_scr var2_scr var3_scr
#> <dbl> <dbl> <dbl> <dbl>
#> 1 70 50.5 50.5 NA
#> 2 72 25.8 50.5 NA
#> 3 73 50.5 NA 75.2
#> 4 70 50.5 75.2 NA
#> 5 70 50.5 50.5 NA
#> 6 71 50.5 50.5 NA
#> 7 70 75.2 75.2 NA
#> 8 71 75.2 75.2 NA
#> 9 71 50.5 100 NA
#> 10 70 75.2 75.2 NA

# summarize the scores
df %>% group_by(group_var) %>%
summarise_at(vars(ends_with("_scr")), funs(mean(., na.rm = TRUE)))

#> # A tibble: 4 x 4
#> group_var var1_scr var2_scr var3_scr
#> <dbl> <dbl> <dbl> <dbl>
#> 1 70 60.4 65.4 NaN
#> 2 71 58.8 75.2 NaN
#> 3 72 25.8 50.5 NaN
#> 4 73 50.5 NaN 75.2

# count all the oberservations
df %>% group_by(group_var) %>%
summarise(obs = n())
#> # A tibble: 4 x 2
#> group_var obs
#> <dbl> <int>
#> 1 70 5
#> 2 71 3
#> 3 72 1
#> 4 73 1

# my goal is to produce this dataset but using the mutate_at function
df %>% group_by(group_var) %>%
summarise(var1_scr = mean(var1_scr),
var2_scr = mean(var2_scr),
var3_scr = mean(var3_scr),
obs = n())
#> # A tibble: 4 x 5
#> group_var var1_scr var2_scr var3_scr obs
#> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 70 60.4 65.4 NA 5
#> 2 71 58.8 75.2 NA 3
#> 3 72 25.8 50.5 NA 1
#> 4 73 50.5 NA 75.2 1

reprex package于2019年8月15日创建(v0.3.0)

最佳答案

一个选项是在按“group_var”分组后在分组变量中添加“n”,然后执行summarise_at

library(dplyr)
df %>%
group_by(group_var) %>%
group_by(obs = n(), add = TRUE) %>%
summarise_at(vars(ends_with("_scr")), list(~mean(., na.rm = TRUE)))
# A tibble: 4 x 5
# Groups: group_var [4]
# group_var obs var1_scr var2_scr var3_scr
# <dbl> <int> <dbl> <dbl> <dbl>
#1 70 5 60.4 65.4 NaN
#2 71 3 58.8 75.2 NaN
#3 72 1 25.8 50.5 NaN
#4 73 1 50.5 NaN 75.2
<小时/>

另一种选择是使用 mutate 创建频率列,并通过将其也包含在 summarise_at 中来获取平均值(例如 >平均值(rep(3, 5)) -> 3)

df %>% 
group_by(group_var) %>%
mutate(obs = n()) %>%
summarise_at(vars(ends_with("_scr"), obs), list(~mean(., na.rm = TRUE)))
# A tibble: 4 x 5
# group_var var1_scr var2_scr var3_scr obs
# <dbl> <dbl> <dbl> <dbl> <dbl>
#1 70 60.4 65.4 NaN 5
#2 71 58.8 75.2 NaN 3
#3 72 25.8 50.5 NaN 1
#4 73 50.5 NaN 75.2 1

注意:这两者都为“obs”提供一列

<小时/>

这里,OP 的预期输出是一个汇总输出,对于该输出,summarise/summarise_at/summarise_all/summarise_if 非常有效。但是,如果我们需要使用mutate_at(仅用于演示)

df %>% 
group_by(group_var) %>%
mutate(obs = n()) %>%
mutate_at(vars(ends_with("_scr"), obs), list(~mean(., na.rm = TRUE))) %>%
distinct_at(vars(group_var, ends_with("_scr"), obs))
# A tibble: 4 x 5
# Groups: group_var [4]
# group_var var1_scr var2_scr var3_scr obs
# <dbl> <dbl> <dbl> <dbl> <dbl>
#1 70 60.4 65.4 NaN 5
#2 72 25.8 50.5 NaN 1
#3 73 50.5 NaN 75.2 1
#4 71 58.8 75.2 NaN 3

关于r - 如何使用 dplyr 来总结两个函数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57514455/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com