gpt4 book ai didi

r - 允许 .SDcols 随 data.table 中的分组变量而变化

转载 作者:行者123 更新时间:2023-12-04 11:34:29 27 4
gpt4 key购买 nike

是否允许有 .SDcolsby而变化分组变量?我有以下情况,想换.SDcols到每年不同的列。 .SDcols 的值在一个 data.table 中,而我正在尝试将函数应用于 .SD在另一个表中使用这些值。

很可能我错过了明显的方法并且做错了,但这就是我正在尝试的,

## Contains the .SDcols applicable to each year
dat1 <- data.table(
year = 1:4,
vals = lapply(1:4, function(i) letters[1:i])
)

## Make the sample data (with NAs)
set.seed(1775)
dat2 <- data.table( year = sample(1:4, 10, TRUE) )
dat2[, letters[1:4] := replicate(4, sample(c(NA, 1:5), 10, TRUE), simplify=FALSE)]

## Goal: Sum up the columns in the corresponding .SDcols for each year
## Attempt, doesn't work -- I think b/c .SDcols must be fixed?
dat2[, SUM := rowSums(.SD, na.rm=TRUE), by=year,
.SDcols=unlist(dat1[year == .BY[[1]], vals])]

## Desired result, by simply iterating through each possible year
for (i in 1:4) {
dat2[year==i, SUM := rowSums(.SD, na.rm=TRUE),
.SDcols=unlist(dat1[year == i, vals])]
}

dat2[]
# year a b c d SUM
# 1: 1 3 1 5 1 3
# 2: 2 1 3 3 1 4
# 3: 1 5 4 3 NA 5
# 4: 4 1 NA 4 5 10
# 5: 2 2 2 2 NA 4
# 6: 2 NA 3 3 NA 3
# 7: 4 2 3 2 NA 7
# 8: 1 2 NA 5 4 2
# 9: 2 3 3 5 1 6
# 10: 3 NA 4 2 NA 6

最佳答案

在我看来,您只是在寻找一个简单的连接,同时通过 dat1 ( by = .EACHI) )中的每个值更新值(通过引用)。无论哪种方式, rowSums 都是此解决方案和您的尝试中的瓶颈(因为矩阵转换). 如果我是你,我会将所有 NA s 转换为零并运行 Reduce(`+`,...) (但不确定是否要更改原始数据中的值)

dat2[dat1, 
SUM := rowSums(.SD[, unlist(i.vals), with = FALSE], na.rm = TRUE),
on = "year",
by = .EACHI]
dat2
# year a b c d SUM
# 1: 1 3 1 5 1 3
# 2: 2 1 3 3 1 4
# 3: 1 5 4 3 NA 5
# 4: 4 1 NA 4 5 10
# 5: 2 2 2 2 NA 4
# 6: 2 NA 3 3 NA 3
# 7: 4 2 3 2 NA 7
# 8: 1 2 NA 5 4 2
# 9: 2 3 3 5 1 6
# 10: 3 NA 4 2 NA 6

虽然如果我是你,如上所述,我会将 NA s 转换为零并使用 Reduce 代替
for(j in 2:ncol(dat2)) set(dat2, i = which(is.na(dat2[[j]])), j = j, value = 0L)
dat2[dat1,
SUM := Reduce(`+`, .SD[, unlist(i.vals), with = FALSE]),
on = "year",
by = .EACHI]
dat2
# year a b c d SUM
# 1: 1 3 1 5 1 3
# 2: 2 1 3 3 1 4
# 3: 1 5 4 3 0 5
# 4: 4 1 0 4 5 10
# 5: 2 2 2 2 0 4
# 6: 2 0 3 3 0 3
# 7: 4 2 3 2 0 7
# 8: 1 2 0 5 4 2
# 9: 2 3 3 5 1 6
# 10: 3 0 4 2 0 6

关于r - 允许 .SDcols 随 data.table 中的分组变量而变化,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35500985/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com