gpt4 book ai didi

r - 如何确定r中长序列中最长的连续序列

转载 作者:行者123 更新时间:2023-12-04 12:34:55 25 4
gpt4 key购买 nike

我有一个序列作为玩具示例。
如何确定最长的连续子序列?
现在,我可以找到断点在哪里,我怎样才能得到这些值?

DT <- data.table(X = c(3:7, 16:18, 22:29, 31:36))
DT[,Y:=(shift(.SD,type = "lag", fill = -1))][,Y:= Y-X]
with(DT, which(Y !=-1))

我希望找到的是子序列的值,在这种情况下,应该是 c(22, 23, 24, 25, 26, 27, 28, 29)

最佳答案

当数据变大时,另一个应该更快的选择:

DT[DT[, {
rl <- cumsum(c(1L, diff(X)>1L))
rw <- rowid(rl)
.I[rl==rl[which.max(rw)]]}]]

计时码:
set.seed(0L)
nr <- 1e7
ngap <- nr/2
DT <- data.table(X=sample(nr, ngap))
setorder(DT, X)

mtd0 <- function() {
DT[, length := .N, by = cumsum(c(1, diff(X) != 1))][length == max(length), X]
}

mtd1 <- function() {
ls <- split(DT$X, cumsum(c(TRUE, diff(DT$X) != 1)))
DT[X %in% ls[[which.max(lengths(ls))]], X]
}

mtd2 <- function() {
DT[DT[, {
rl <- cumsum(c(1L, diff(X)>1L))
rw <- rowid(rl)
.I[rl==rl[which.max(rw)]]}], X]
}

bench::mark(mtd0(), mtd1(), mtd2(), check=FALSE)

输出:
> mtd0()
[1] 4622514 4622515 4622516 4622517 4622518 4622519 4622520 4622521 4622522 4622523 4622524 4622525 4622526 4622527 4622528 4622529 4622530 4622531 4622532
[20] 4622533 4622534 4622535 8390357 8390358 8390359 8390360 8390361 8390362 8390363 8390364 8390365 8390366 8390367 8390368 8390369 8390370 8390371 8390372
[39] 8390373 8390374 8390375 8390376 8390377 8390378
> mtd1()
[1] 4622514 4622515 4622516 4622517 4622518 4622519 4622520 4622521 4622522 4622523 4622524 4622525 4622526 4622527 4622528 4622529 4622530 4622531 4622532
[20] 4622533 4622534 4622535
> mtd2()
[1] 4622514 4622515 4622516 4622517 4622518 4622519 4622520 4622521 4622522 4622523 4622524 4622525 4622526 4622527 4622528 4622529 4622530 4622531 4622532
[20] 4622533 4622534 4622535

时间:
# A tibble: 3 x 13
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list>
1 mtd0() 1.34s 1.34s 0.747 363MB 1.49 1 2 1.34s <int [44]> <df[,3] [42 x 3]> <bch:tm> <tibble [1 x 3]>
2 mtd1() 2.13s 2.13s 0.470 548MB 1.88 1 4 2.13s <int [22]> <df[,3] [34,671 x 3]> <bch:tm> <tibble [1 x 3]>
3 mtd2() 642.91ms 642.91ms 1.56 343MB 4.67 1 3 642.91ms <int [22]> <df[,3] [29 x 3]> <bch:tm> <tibble [1 x 3]>

关于r - 如何确定r中长序列中最长的连续序列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58897687/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com