gpt4 book ai didi

R ddply 循环;多重因素

转载 作者:行者123 更新时间:2023-12-04 20:40:28 25 4
gpt4 key购买 nike

我想使用 ddply 通过多种因素汇总来自多个变量的数据。

我有以下测试数据:

site    block   plot    rep name    weight  height  dtf
Alberta 1 2 1 A 43 139 54
Alberta 2 5 2 A 46 139 46
Alberta 4 10 3 A 49 136 54
Nunavut 1 1 1 A 49 136 59
Nunavut 2 4 2 A 51 135 50
Nunavut 3 8 3 A 52 133 56
Alberta 5 13 1 B 55 132 50
Alberta 4 12 2 B 55 125 46
Alberta 5 15 3 B 56 120 46
Nunavut 5 14 1 B 57 119 54
Nunavut 5 13 2 B 58 119 55
Nunavut 4 11 3 B 59 118 51
...

等等。

我想取变量“重量”、“高度”、“dtf”,并根据“站点”和“名称”因素对它们进行汇总。

我从列名的向量开始:
data.factors <- NULL
data.variables <- NULL
for(n in 1:length(data)){if(is.factor(data[[n]])){ data.factors <- c(data.factors,colnames(data[n]))} else next}
for(n in 1:length(data)){if(is.numeric(data[[n]]) || is.integer(data[[n]])){ data.variables <- c(data.variables,colnames(data[n]))} else next}

这适用于执行多个单因素方差分析:
for(variables in data.variables){
for(factors in data.factors){
output1 <- aov(lm(data[[variables]]~data[[factors]]))
cat(variables)
cat(" by ")
cat(factors)
cat("\n")
print(summary(output1))
}}

但我无法让它与 ddply 一起使用。
for (x in data.variables){
variable.summary <- ddply(data, .(site,name), summarise,
N = sum(!is.na(x[1])),
min = min(x[1], na.rm=TRUE),
max = max(x[1], na.rm=TRUE),
mean = mean(x[1], na.rm=TRUE),
sd = sd(x[1], na.rm=TRUE),
se = sd / sqrt(N)
)
print(variable.summary)
}

我得到的只是以下内容:
site name N    min    max mean sd se
1 Alberta A 1 weight weight NA NA NA
2 Alberta B 1 weight weight NA NA NA
3 Alberta C 1 weight weight NA NA NA
4 Alberta D 1 weight weight NA NA NA
5 Alberta E 1 weight weight NA NA NA
6 Nunavut A 1 weight weight NA NA NA
7 Nunavut B 1 weight weight NA NA NA
8 Nunavut C 1 weight weight NA NA NA
9 Nunavut D 1 weight weight NA NA NA
10 Nunavut E 1 weight weight NA NA NA
....

如果我使用单个变量(直接输入而不是通过“x”引用的变量)测试 ddply,它会正常工作。

让函数识别引用的列 ID 有什么技巧吗?我已经习惯了 PERL,它的 $Scalars 可以在任何地方引用,并且希望在 R 中可以使用类似的系统。

最佳答案

ddply 的后继者 dplyr 可以使用 group_by() 非常轻松地做到这一点。和 summarise_each() ,无需循环任何内容:

df <- data.frame(site = c("Alberta", "Alberta", "Alberta", "Nunavut", "Nunavut", "Nunavut", "Alberta", "Alberta", "Alberta", "Nunavut", "Nunavut", "Nunavut"),
block = c(1, 2, 4, 1, 2, 3, 5, 4, 5, 5, 5, 4),
plot = c(2, 5, 10, 1, 4, 8, 13, 12, 15, 14, 13, 11),
rep = c(1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3),
name = c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"),
weight = c(43, 46, 49, 49, 51, 52, 55, 55, 56, 57, 58, 59),
height = c(139, 139, 136, 136, 135, 133, 132, 125, 120, 119, 119, 118),
dtf = c(54, 46, 54, 59, 50, 56, 50, 46, 46, 54, 55, 51))

library(dplyr)

df.summary <- df %>%
group_by(site, name) %>%
summarise_each(funs(sum, min, max, mean, sd), weight, height, dtf)

结果是这样的数据框:
> df.summary
Source: local data frame [4 x 17]
Groups: site

site name weight_length height_length dtf_length weight_min height_min dtf_min
1 Alberta A 3 3 3 43 136 46
2 Alberta B 3 3 3 55 120 46
3 Nunavut A 3 3 3 49 133 50
4 Nunavut B 3 3 3 57 118 51
Variables not shown: weight_max (dbl), height_max (dbl), dtf_max (dbl), weight_mean (dbl),
height_mean (dbl), dtf_mean (dbl), weight_sd (dbl), height_sd (dbl), dtf_sd (dbl)

您可以将任何您想要的函数传递给 funs()summarise_each ,所以如果你想要一列标准错误,只需先创建函数:
se <- function(x) {
N <- sum(!is.na(x[1]))
return(sd / sqrt(N))
}

并通过: summarise_each(funs(sum, min, max, mean, sd, se)...)

关于R ddply 循环;多重因素,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27414068/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com