gpt4 book ai didi

r - Dplyr函数可计算平均值,n,sd和标准误差

转载 作者:行者123 更新时间:2023-12-02 08:13:09 24 4
gpt4 key购买 nike

我发现自己一直在编写这段代码,以产生群均值的标准错误(然后用于绘制置信区间)。

不过,最好编写自己的函数来在一行代码中执行此操作。我已经阅读了dplyr中关于非标准评估的nse小插图和this blog post。我有所了解,但我实在是个菜鸟,无法靠自己解决这个问题。有人可以帮忙吗?谢谢。

var1<-sample(c('red', 'green'), size=10, replace=T)
var2<-rnorm(10, mean=5, sd=1)
df<-data.frame(var1, var2)
df %>%
group_by(var1) %>%
summarize(avg=mean(var2), n=n(), sd=sd(var2), se=sd/sqrt(n))

最佳答案

您可以使用enquo函数在函数调用中显式命名变量:

my_fun <- function(x, cat_var, num_var){
cat_var <- enquo(cat_var)
num_var <- enquo(num_var)

x %>%
group_by(!!cat_var) %>%
summarize(avg = mean(!!num_var), n = n(),
sd = sd(!!num_var), se = sd/sqrt(n))
}

这给你:
> my_fun(df, var1, var2)
# A tibble: 2 x 5
var1 avg n sd se
<fctr> <dbl> <int> <dbl> <dbl>
1 green 4.873617 7 0.7515280 0.2840509
2 red 5.337151 3 0.1383129 0.0798550

并且与您的示例的输出匹配:
> df %>% 
+ group_by(var1) %>%
+ summarize(avg=mean(var2), n=n(), sd=sd(var2), se=sd/sqrt(n))
# A tibble: 2 x 5
var1 avg n sd se
<fctr> <dbl> <int> <dbl> <dbl>
1 green 4.873617 7 0.7515280 0.2840509
2 red 5.337151 3 0.1383129 0.0798550

编辑:

OP要求从函数中删除 group_by语句,以使group_by具有多个变量。进行此IMO有两种方法。首先,您可以简单地删除 group_by语句,然后将分组的数据帧通过管道传递到函数中。该方法如下所示:
my_fun <- function(x, num_var){
num_var <- enquo(num_var)

x %>%
summarize(avg = mean(!!num_var), n = n(),
sd = sd(!!num_var), se = sd/sqrt(n))
}

df %>%
group_by(var1) %>%
my_fun(var2)

解决此问题的另一种方法是使用 ...quos允许该函数捕获 group_by语句的多个参数。看起来像这样:
#first, build the new dataframe
var1<-sample(c('red', 'green'), size=10, replace=T)
var2<-rnorm(10, mean=5, sd=1)
var3 <- sample(c("A", "B"), size = 10, replace = TRUE)
df<-data.frame(var1, var2, var3)

# using the first version `my_fun`, it would look like this
df %>%
group_by(var1, var3) %>%
my_fun(var2)

# A tibble: 4 x 6
# Groups: var1 [?]
var1 var3 avg n sd se
<fctr> <fctr> <dbl> <int> <dbl> <dbl>
1 green A 5.248095 1 NaN NaN
2 green B 5.589881 2 0.7252621 0.5128378
3 red A 5.364265 2 0.5748759 0.4064986
4 red B 4.908226 5 1.1437186 0.5114865

# Now doing it with a new function `my_fun2`
my_fun2 <- function(x, num_var, ...){
group_var <- quos(...)
num_var <- enquo(num_var)

x %>%
group_by(!!!group_var) %>%
summarize(avg = mean(!!num_var), n = n(),
sd = sd(!!num_var), se = sd/sqrt(n))
}

df %>%
my_fun2(var2, var1, var3)

# A tibble: 4 x 6
# Groups: var1 [?]
var1 var3 avg n sd se
<fctr> <fctr> <dbl> <int> <dbl> <dbl>
1 green A 5.248095 1 NaN NaN
2 green B 5.589881 2 0.7252621 0.5128378
3 red A 5.364265 2 0.5748759 0.4064986
4 red B 4.908226 5 1.1437186 0.5114865

关于r - Dplyr函数可计算平均值,n,sd和标准误差,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44266376/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com