gpt4 book ai didi

r - split (1 :n)[boolean] into contiguous sequences

转载 作者:行者123 更新时间:2023-12-02 05:37:43 24 4
gpt4 key购买 nike

我想将数据分成通过某些测试的连续行组。这是一个例子:

set.seed(1)
n <- 29
ok <- sample(c(TRUE,FALSE),n,replace=TRUE,prob=c(.7,.3))

vec <- (1:n)[ok]
# [1] 1 2 3 5 8 9 10 11 12 13 14 16 19 22 23 24 25 26 27 28

所需的输出是“vec”,分组为连续序列:

out <- list(1:3,5,8:14,16,19,22:28)

这有效:

nv  <- length(vec)

splits <- 1 + which(diff(vec) != 1)
splits <- c(1,splits,nv+1)
nsp <- length(splits)

out <- list()
for (i in 1:(nsp-1)){
out[[i]] <- vec[splits[i]:(splits[i+1]-1)]
}

我猜在 R 基础上有一种更干净的方法...?我还不熟悉 rlecumsum我在 SO 上见过的技巧...

最佳答案

这里有一个cumsum给你的“技巧”:

split(vec, cumsum(c(1, diff(vec)) - 1))

更新

这是一个使用您的版本 split(vec, cumsum(c(0, diff(vec) > 1))) 的简单示例,其中每个步骤都分割:

vec <- c(1:3,7:9)            #  1 2 3 7 8 9 (sample with two contiguous sequences)
diff(vec) # 1 1 4 1 1 (lagged difference)
diff(vec) > 1 # F F T F F (not contiguous where diff > 1)
# 0 0 1 0 0 (numeric equivalent for T/F)
c(0, diff(vec) > 1) # 0 0 0 1 0 0 (pad with 0 to align with original vector)
cumsum(c(0, diff(vec) > 1)) # 0 0 0 1 1 1 (cumulative sum of logical values)

groups <- cumsum(c(0, diff(vec) > 1)) # 0 0 0 1 1 1

sets <- split(vec, groups) # split into groups named by cumulative sum

sets
# $`0`
# [1] 1 2 3
#
# $`1`
# [1] 7 8 9

然后如果你因为某种原因想输出它:

# Create strings representing each contiguous range
set_strings <- sapply(sets, function(x) paste0(min(x),":",max(x)))

set_strings
# 0 1
# "1:3" "7:9"

# Print out a concise representation of all contiguous sequences
print(paste0(set_strings,collapse=","))

# [1] "1:3,7:9"

关于r - split (1 :n)[boolean] into contiguous sequences,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/16800803/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com