gpt4 book ai didi

r - 高效的R代码,用于查找与向量中的唯一值相关的索引

转载 作者:行者123 更新时间:2023-12-04 05:38:37 26 4
gpt4 key购买 nike

假设我有矢量vec <- c("D","B","B","C","C")

我的目标是以一个维度length(unique(vec))的列表结尾,该列表中的每个i返回一个索引向量,这些索引表示unique(vec)[i]vec中的位置。

例如,vec的此列表将返回:

exampleList <- list()
exampleList[[1]] <- c(1) #Since "D" is the first element
exampleList[[2]] <- c(2,3) #Since "B" is the 2nd/3rd element.
exampleList[[3]] <- c(4,5) #Since "C" is the 4th/5th element.

我尝试了以下方法,但速度太慢。我的示例很大,因此我需要更快的代码:
vec <- c("D","B","B","C","C")
uniques <- unique(vec)
exampleList <- lapply(1:3,function(i) {
which(vec==uniques[i])
})
exampleList

最佳答案

更新:行为DT[, list(list(.)), by=.]有时在R版本> = 3.1.0中导致错误的结果。现在已在commit #1280 v1.9.3的当前开发版本中的data.table中修复了此问题。从NEWS:

  • DT[, list(list(.)), by=.] returns correct results in R >=3.1.0 as well. The bug was due to recent (welcoming) changes in R v3.1.0 where list(.) does not result in a copy. Closes #481.


使用 data.table的速度大约是 tapply的15倍:
library(data.table)

vec <- c("D","B","B","C","C")

dt = as.data.table(vec)[, list(list(.I)), by = vec]
dt
# vec V1
#1: D 1
#2: B 2,3
#3: C 4,5

# to get it in the desired format
# (perhaps in the future data.table's setnames will work for lists instead)
setattr(dt$V1, 'names', dt$vec)
dt$V1
#$D
#[1] 1
#
#$B
#[1] 2 3
#
#$C
#[1] 4 5

速度测试:
vec = sample(letters, 1e7, T)

system.time(tapply(seq_along(vec), vec, identity)[unique(vec)])
# user system elapsed
# 7.92 0.35 8.50

system.time({dt = as.data.table(vec)[, list(list(.I)), by = vec]; setattr(dt$V1, 'names', dt$vec); dt$V1})
# user system elapsed
# 0.39 0.09 0.49

关于r - 高效的R代码,用于查找与向量中的唯一值相关的索引,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22993637/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com