gpt4 book ai didi

r - 查找二进制序列中的连续出现

转载 作者:行者123 更新时间:2023-12-02 07:19:15 25 4
gpt4 key购买 nike

我有一个看起来像这样的二进制序列:

set.seed(1)
n <- 1000
x <- sample(c(0,1), n, rep = TRUE)

如何找到恰好连续 2 个、连续 3 个等的次数?例如,我可以使用

找到连续 至少 2 个的次数
length(which((x[-1] == 1) & (diff(x) == 0)))

最佳答案

我们可以创建一个带有游程编码的函数 (rle)

with(rle(x), sum(values == 1 & lengths == 2))

fn_len <- function(vec, val, n) {
with(rle(vec), sum(values == val & lengths == n))
}

fn_len(x, 1, 2)
#[1] 63
fn_len(x, 1, 3)
#[1] 34

如果我们需要获取多个元素的长度

sapply(2:5, fn_len, vec = x, val = 1)
#[1] 63 34 19 7

或者另一个选项是 data.table 中的 rleid

library(data.table)
data.table(x)[, .N, .(x, rleid(x))][x==1, sum(N==2)]
#[1] 63

基准

set.seed(1)
n <- 1e7
x <- sample(c(0, 1), n, replace = TRUE)

system.time(out1 <- table(scan(text=gsub("0+",";",paste0(x,collapse="")),
sep=";",quiet = T))[2])
# user system elapsed
# 11.818 0.152 11.976

system.time(out2 <- table(strsplit(gsub("0+",";",paste0(x,collapse="")),
";")[[1]])[3])
# user system elapsed
#10.708 0.200 10.913

system.time(fn_len(x, 1, 2))
# user system elapsed
# 0.671 0.399 1.073

如果我们想同时拥有多个'n',data.table 方法会更快

system.time(data.table(x)[, .N, .(x, rleid(x))][x==1, .N, N])
# user system elapsed
# 2.246 0.285 2.561

system.time(sapply(2:21, fn_len, vec = x, val = 1))
# user system elapsed
# 14.171 6.103 20.323

system.time(table(strsplit(gsub("0+",";",paste0(x,collapse="")),";")[[1]]))
# user system elapsed
# 10.570 0.192 10.770

关于r - 查找二进制序列中的连续出现,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50753254/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com