gpt4 book ai didi

r - 聚合每个观察值是否可以属于多个组

转载 作者:行者123 更新时间:2023-12-01 10:21:53 25 4
gpt4 key购买 nike

我想按组聚合日期。但是,每个观察值可以属于多个组(例如,观察值 1 属于 A 组和 B 组)。我找不到使用 data.table 实现此目的的好方法。目前,我为每个可能的组创建了一个逻辑变量,如果观察属于该组,则该变量的值为 TRUE。我正在寻找比下面介绍的更好的方法来做到这一点。我还想知道如何使用 tidyverse 实现此目的。

library(data.table)
# Data
set.seed(1)
TF <- c(TRUE, FALSE)
time <- rep(1:4, each = 5)
df <- data.table(time = time, x = rnorm(20), groupA = sample(TF, size = 20, replace = TRUE),
groupB = sample(TF, size = 20, replace = TRUE),
groupC = sample(TF, size = 20, replace = TRUE))

# This should be nicer and less repetitive
df[groupA == TRUE, .(A = sum(x)), by = time][
df[groupB == TRUE, .(B = sum(x)), by = time], on = "time"][
df[groupC == TRUE, .(C = sum(x)), by = time], on = "time"]

# desired output
time A B C
1: 1 NA 0.9432955 0.1331984
2: 2 1.2257538 0.2427420 0.1882493
3: 3 -0.1992284 -0.1992284 1.9016244
4: 4 0.5327774 0.9438362 0.9276459

最佳答案

这是一个使用 data.table 的解决方案:

df[, lapply(.SD[, .(groupA, groupB, groupC)]*x, sum), time]
# > df[, lapply(.SD[, .(groupA, groupB, groupC)]*x, sum), time]
# time groupA groupB groupC
# 1: 1 0.0000000 0.9432955 0.1331984
# 2: 2 1.2257538 0.2427420 0.1882493
# 3: 3 -0.1992284 -0.1992284 1.9016244
# 4: 4 0.5327774 0.9438362 0.9276459

或(感谢 @chinsoon12 的评论)更多编程方式:

df[, lapply(.SD*x, sum), by=.(time), .SDcols=paste0("group", c("A","B","C"))]

如果你想要长格式的结果,你可以这样做:

df[, colSums(.SD*x), by=.(time), .SDcols=paste0("group", c("A","B","C"))]
### with indicator for the group:
df[, .(colSums(.SD*x), c("A","B","C")), by=.(time), .SDcols=paste0("group", c("A","B","C"))]

关于r - 聚合每个观察值是否可以属于多个组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50483582/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com