gpt4 book ai didi

r - 查找重复次数最多的值序列

转载 作者:塔克拉玛干 更新时间:2023-11-03 02:42:45 24 4
gpt4 key购买 nike

假设我有一个看起来像这样的数据框

    ITEM
1 X
2 A
3 B
4 C
5 A
6 F
7 U
8 A
9 B
10 C
11 F
12 U

如何获得“ITEM”列中最常见的值序列?。在这种情况下,最常见的序列是 A, B, C,因为它出现在第 2 到 4 行和第 8 到 10 行。

我已经尝试过函数rle,以及这里找到的一些解决方案,但我并不走运。我可以提出建议、提示或套餐推荐吗?

最佳答案

我猜你想要最长的非重叠子串。关于动态规划解决方案有一些很好的解释 here .

x = c("X", "A", "B", "C", "A", "F", "U", "A", "B", "C", "F", "U")
n = length(x)
m1 = sapply(x, function(i) sapply(x, function(j) as.integer(i == j)))
diag(m1) = 0
m1[lower.tri(m1)] = 0
m1
# X A B C A F U A B C F U
# X 0 0 0 0 0 0 0 0 0 0 0 0
# A 0 0 0 0 1 0 0 1 0 0 0 0
# B 0 0 0 0 0 0 0 0 1 0 0 0
# C 0 0 0 0 0 0 0 0 0 1 0 0
# A 0 0 0 0 0 0 0 1 0 0 0 0
# F 0 0 0 0 0 0 0 0 0 0 1 0
# U 0 0 0 0 0 0 0 0 0 0 0 1
# A 0 0 0 0 0 0 0 0 0 0 0 0
# B 0 0 0 0 0 0 0 0 0 0 0 0
# C 0 0 0 0 0 0 0 0 0 0 0 0
# F 0 0 0 0 0 0 0 0 0 0 0 0
# U 0 0 0 0 0 0 0 0 0 0 0 0

m2 = m1
for (i in 2:nrow(m1)){
for (j in 2:nrow(m1)){
if (m1[i-1, j-1] == 1 & m1[i, j] == 1){
if (j - i > m2[i - 1, j - 1]){
m2[i, j] = m2[i - 1, j - 1] + m2[i, j]
m2[i - 1, j - 1] = 0
} else {
m2[i, j] = 0
}
}
}
}
m2
# X A B C A F U A B C F U
# X 0 0 0 0 0 0 0 0 0 0 0 0
# A 0 0 0 0 1 0 0 0 0 0 0 0
# B 0 0 0 0 0 0 0 0 0 0 0 0
# C 0 0 0 0 0 0 0 0 0 3 0 0
# A 0 0 0 0 0 0 0 1 0 0 0 0
# F 0 0 0 0 0 0 0 0 0 0 0 0
# U 0 0 0 0 0 0 0 0 0 0 0 2
# A 0 0 0 0 0 0 0 0 0 0 0 0
# B 0 0 0 0 0 0 0 0 0 0 0 0
# C 0 0 0 0 0 0 0 0 0 0 0 0
# F 0 0 0 0 0 0 0 0 0 0 0 0
# U 0 0 0 0 0 0 0 0 0 0 0 0

ans_len = max(m2)
inds = c(which(m2 == ans_len, arr.ind = TRUE)[,2])
lapply(inds, function(ind) x[(ind - ans_len + 1):ind])
# [[1]]
# [1] "A" "B" "C"

关于r - 查找重复次数最多的值序列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54895662/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com