gpt4 book ai didi

r - 在 data.table 中创建新列时如何引用整行?

转载 作者:行者123 更新时间:2023-12-04 09:38:40 24 4
gpt4 key购买 nike

我有一个 data.table有 200 多个变量,它们都是二进制的。我想在其中创建一个新列来计算每行和引用向量之间的差异:

#Example
dt = data.table(
"V1" = c(1,1,0,1,0,0,0,1,0,1,0,1,1,0,1,0),
"V2" = c(0,1,0,1,0,1,0,0,0,0,1,1,0,0,1,0),
"V3" = c(0,0,0,1,1,1,1,0,1,0,1,0,1,0,1,0),
"V4" = c(1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0),
"V5" = c(1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0)
)

reference = c(1,1,0,1,0)

我可以用一个小的 for 循环来做到这一点,比如
distance = NULL
for(i in 1:nrow(dt)){
distance[i] = sum(reference != dt[i,])
}

但这有点慢,而且肯定不是最好的方法。我试过:
dt[,"distance":= sum(reference != c(V1,V2,V3,V4,V5))]
dt[,"distance":= sum(reference != .SD)]

但两者都不起作用,因为它们为所有行返回相同的值。此外,我不必键入所有变量名称的解决方案会好得多,因为真正的 data.table 有 200 多列

最佳答案

您可以使用 sweep()rowSums , IE。

rowSums(sweep(dt, 2, reference) != 0)
#[1] 2 2 2 2 4 4 3 2 4 3 2 1 3 4 1 3

基准
HUGH <- function(dt) {
dt[, I := .I]
distance_by_I <- melt(dt, id.vars = "I")[, .(distance = sum(reference != value)), keyby = "I"]
return(dt[distance_by_I, on = "I"])
}

Sotos <- function(dt) {
return(rowSums(sweep(dt, 2, reference) != 0))
}

dt1 <- as.data.table(replicate(5, sample(c(0, 1), 100000, replace = TRUE)))
microbenchmark(HUGH(dt1), Sotos(dt1))

#Unit: milliseconds
# expr min lq mean median uq max neval cld
# HUGH(dt1) 112.71936 117.03380 124.05758 121.6537 128.09904 155.68470 100 b
# Sotos(dt1) 23.66799 31.11618 33.84753 32.8598 34.02818 68.75044 100 a

关于r - 在 data.table 中创建新列时如何引用整行?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54347114/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com