gpt4 book ai didi

r - 移除向量中至少N个连续值的序列

转载 作者:行者123 更新时间:2023-12-03 17:32:08 25 4
gpt4 key购买 nike

这是我的问题。我有大量正面数据。我的目标是删除向量中重复的至少N个连续值的序列(所有重复值必须严格大于0)。

我编写了一个程序,该程序如下所示:
X是我的数值向量;
N是重复序列的最小长度。

rmpParNASerieRepetee <- function(X, N)
{
X_ <- paste("T", paste(X, collapse="T"), "T", sep="")
ind.parcours <- 1
ind.sup <- c()

# Loop on the values
while ( ind.parcours <= (length(X)-N+1) )
{
# indices of my sequence of N values
deb <- ind.parcours
fin <- ind.parcours + N-1

# sequence of N values to search in the vector
serie <- X[deb:fin]
serie_ <- paste("T", paste(serie, collapse="T"), "T", sep="")


borne <- 1*(ind.parcours < (length(X)-N+1)) + 0*(ind.parcours == (length(X)-N+1))
if (sum(X[(length(X)-N+1):length(X)]==serie)==3) borne <- 0

# split my string vector by my sequence vector of N values and count the pieces of result
if ( length(unlist(strsplit(X_, serie_)))-1 > borne && length(which(serie!=0))>=N)
{ ind.sup <- unique(c(ind.sup, deb:fin)) }
ind.parcours <- ind.parcours+1
}
if (length(ind.sup !=0)) { X[ind.sup] <- NA }

list_return <- list(X=X, Ind.sup=unique(sort(ind.sup)))
return (list_return)
}


我认为我的功能确实不是最佳的(对于92,000个值的矢量,N = 18的计算时间为1:15)。我必须执行此步骤1600次...大约需要3个月的时间...

有谁有更好的主意?

范例:

x <- c(1,2,3,4,0,4,1,2,3,8,9,1,2,3,4,0)
N <- 3
# (1,2,3) is a sequence of 3 elements which is repeated
# (1,2,3,4) is sequence of 4 elements which is repeated
# no other sequence of length at least 3 repeats
# my result should also be :
# NA NA NA NA 0 4 NA NA NA 8 9 NA NA NA NA 0
# The result of my program is :
# $X
# [1] NA NA NA NA 0 4 NA NA NA 8 9 NA NA NA NA 0
#$Ind.sup
# [1] 1 2 3 4 7 8 9 12 13 14 15

最佳答案

方式:

f <- function(X, N)
{
.rle <- rle(sort(X))
res <- .rle$values[.rle$lengths >= N]
res <- res[res > 0]
inds <- X %in% res
X[inds] <- NA
list(X = X, Ind = which(inds))
}

#> f(X, 3)
#$X
# [1] NA NA NA NA 0 0 0 0 NA NA NA NA NA NA 8 9 NA NA NA NA NA NA 0 0 0
#
#$Ind
# [1] 1 2 3 4 9 10 11 12 13 14 17 18 19 20 21 22

关于r - 移除向量中至少N个连续值的序列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20426949/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com