gpt4 book ai didi

r - 计算两个分组变量的每个组合的列总和

转载 作者:行者123 更新时间:2023-12-04 10:46:22 26 4
gpt4 key购买 nike

这个问题在这里已经有了答案:





How to sum a variable by group

(15 个回答)


6年前关闭。




我有一个看起来像这样的数据集:

 Type Age   count1  count2  Year   Pop1   Pop2  TypeDescrip
A 35 1 1 1990 30000 50000 alpha
A 35 3 1 1990 30000 50000 alpha
A 45 2 3 1990 20000 70000 alpha
B 45 2 1 1990 20000 70000 beta
B 45 4 5 1990 20000 70000 beta

我想添加在 Type 和 Age 列中匹配的行数。所以理想情况下,我最终会得到一个如下所示的数据集:
 Type  Age  count1  count2  Year   Pop1   Pop2  TypeDescrip 
A 35 4 2 1990 30000 50000 alpha
A 45 2 3 1990 20000 70000 alpha
B 45 6 6 1990 20000 70000 beta

我试过使用嵌套 duplicated()声明如下:
typedup = duplicated(df$Type)
bothdup = duplicated(df[(typedup == TRUE),]$Age)

但这会返回重复的年龄或类型的索引,不一定当一行具有两者的重复项时。

我也试过 tapply:
tapply(c(df$count1, df$count2), c(df$Age, df$Type), sum)

但是这个输出很难处理。完成后我想要一个 data.frame。

我不想使用 for 循环,因为我的数据集非常大。

最佳答案

尝试

library(dplyr)
df1 %>%
group_by(Type, Age) %>%
summarise_each(funs(sum))
# Type Age count1 count2
#1 A 35 4 2
#2 A 45 2 3
#3 B 45 6 6

dplyr 的较新版本中
df1 %>%
group_by(Type, Age) %>%
summarise_all(sum)

或使用 base R
 aggregate(.~Type+Age, df1, FUN=sum)
# Type Age count1 count2
#1 A 35 4 2
#2 A 45 2 3
#3 B 45 6 6

或者
library(data.table)
setDT(df1)[, lapply(.SD, sum), .(Type, Age)]
# Type Age count1 count2
#1: A 35 4 2
#2: A 45 2 3
#3: B 45 6 6

更新

基于新的数据集,
 df2 %>%
group_by(Type, Age,Pop1, Pop2, TypeDescrip) %>%
summarise_each(funs(sum), matches('^count'))
# Type Age Pop1 Pop2 TypeDescrip count1 count2
#1 A 35 30000 50000 alpha 4 2
#2 A 45 20000 70000 beta 2 3
#3 B 45 20000 70000 beta 6 6

数据
 df1 <- structure(list(Type = c("A", "A", "A", "B", "B"), Age = c(35L, 
35L, 45L, 45L, 45L), count1 = c(1L, 3L, 2L, 2L, 4L), count2 = c(1L,
1L, 3L, 1L, 5L)), .Names = c("Type", "Age", "count1", "count2"
), class = "data.frame", row.names = c(NA, -5L))

df2 <- structure(list(Type = c("A", "A", "A", "B", "B"), Age = c(35L,
35L, 45L, 45L, 45L), count1 = c(1L, 3L, 2L, 2L, 4L), count2 = c(1L,
1L, 3L, 1L, 5L), Year = c(1990L, 1990L, 1990L, 1990L, 1990L),
Pop1 = c(30000L, 30000L, 20000L, 20000L, 20000L), Pop2 = c(50000L,
50000L, 70000L, 70000L, 70000L), TypeDescrip = c("alpha",
"alpha", "beta", "beta", "beta")), .Names = c("Type", "Age",
"count1", "count2", "Year", "Pop1", "Pop2", "TypeDescrip"),
class = "data.frame", row.names = c(NA, -5L))

关于r - 计算两个分组变量的每个组合的列总和,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31190930/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com