algorithm - 寻找最小长度 RLE-6ren

algorithm - 寻找最小长度 RLE

转载作者：塔克拉玛干更新时间：2023-11-03 02:40:31

经典的 RLE 算法通过使用数字来表示数字后面的字符在文本中该位置出现的次数来压缩数据。例如:

AAABBAAABBCECE => 3A2B3A2B1C1E1C1E

但是，在上面的示例中，该方法导致压缩文本使用更多空间。更好的想法是使用数字来表示数字后面的 substring 在给定文本中出现的次数。例如:

AAABBAAABBCECE => 2AAABB2CE(“AAABB”两次，然后“CE”两次)。

现在，我的问题是:如何使用这种方法实现一种高效算法，找出最佳 RLE 中的最少字符数？存在蛮力方法，但我需要更快的方法(最多 O(length²))。也许我们可以使用动态规划？

最佳答案

可以通过动态规划在二次三次次二次时间内完成。

这是一些 Python 代码:

import sys
import numpy as np

bignum = 10000

S = sys.argv[1] #'AAABBAAABBCECE'                                                                                                                              
N = len(S)

# length of longest substring match bet s[i:] and s[j:]                                                                                                        
maxmatch = np.zeros( (N+1,N+1), dtype=int)

for i in xrange(N-1,-1,-1):
  for j in xrange(i+1,N):
    if S[i] == S[j]:
      maxmatch[i,j] = maxmatch[i+1,j+1]+1

# P[n,k] = cost of encoding first n characters given that last k are a block                                                                                   
P = np.zeros( (N+1,N+1),dtype=int ) + bignum
# Q[n] = cost of encoding first n characters                                                                                                                   
Q = np.zeros(N+1, dtype=int) + bignum

# base case: no cost for empty string                                                                                                                          
P[0,0]=0
Q[0]=0

for n in xrange(1,N+1):
  for k in xrange(1,n+1):
    if n-2*k >= 0:
#     s1, s2 = S[n-k:n], S[n-2*k:n-k]                                                                                                                          
#     if s1 == s2:                                                                                                                                             
      if maxmatch[n-2*k,n-k] >=k:
        # Here we are incrementing the count: C x_1...x_k -> C+1 x_1...x_k                                                                                     
        P[n,k] = min(P[n,k], P[n-k,k])
        print 'P[%d,%d] = %d' % (n,k,P[n,k])
    # Here we are starting a new block: 1 x_1...x_k                                                                                                            
    P[n,k] = min(P[n,k], Q[n-k] + 1 + k)
    print 'P[%d,%d] = %d' % (n,k,P[n,k])
  for k in xrange(1,n+1):
    Q[n] = min(Q[n], P[n,k])

  print

print Q[N]

您可以通过记住您一路上的选择来重建实际的编码。

我遗漏了一个小问题，那就是如果 C 很大，我们可能不得不使用一个额外的字节来保存 C+1。如果您使用的是 32 位整数，则在该算法运行时可行的任何上下文中都不会出现这种情况。如果您有时使用较短的整数来节省空间，那么您将不得不考虑一下，并且可能会根据最新 C 的大小向表中添加另一个维度。理论上，这可能会添加一个 log(N) 因子，但是我认为这在实践中不会很明显。

编辑:为了@Moron 的利益，这里是带有更多打印语句的相同代码，这样您就可以更容易地看到算法在想什么:

import sys
import numpy as np

bignum = 10000

S = sys.argv[1] #'AAABBAAABBCECE'                                                                                                                              
N = len(S)

# length of longest substring match bet s[i:] and s[j:]                                                                                                        
maxmatch = np.zeros( (N+1,N+1), dtype=int)

for i in xrange(N-1,-1,-1):
  for j in xrange(i+1,N):
    if S[i] == S[j]:
      maxmatch[i,j] = maxmatch[i+1,j+1]+1

# P[n,k] = cost of encoding first n characters given that last k are a block                                                                                   
P = np.zeros( (N+1,N+1),dtype=int ) + bignum
# Q[n] = cost of encoding first n characters                                                                                                                   
Q = np.zeros(N+1, dtype=int) + bignum

# base case: no cost for empty string                                                                                                                          
P[0,0]=0
Q[0]=0

for n in xrange(1,N+1):
  for k in xrange(1,n+1):
    if n-2*k >= 0:
#     s1, s2 = S[n-k:n], S[n-2*k:n-k]                                                                                                                          
#     if s1 == s2:                                                                                                                                             
      if maxmatch[n-2*k,n-k] >=k:
        # Here we are incrementing the count: C x_1...x_k -> C+1 x_1...x_k                                                                                     
        P[n,k] = min(P[n,k], P[n-k,k])
        print "P[%d,%d] = %d\t I can encode first %d characters of S in only %d characters if I use my solution for P[%d,%d] with %s's count incremented" % (n\
,k,P[n,k],n,P[n-k,k],n-k,k,S[n-k:n])
    # Here we are starting a new block: 1 x_1...x_k                                                                                                            
    P[n,k] = min(P[n,k], Q[n-k] + 1 + k)
    print 'P[%d,%d] = %d\t I can encode first %d characters of S in only %d characters if I use my solution for Q[%d] with a new block 1%s' % (n,k,P[n,k],n,Q[\
n-k]+1+k,n-k,S[n-k:n])
  for k in xrange(1,n+1):
    Q[n] = min(Q[n], P[n,k])

  print
  print 'Q[%d] = %d\t I can encode first %d characters of S in only %d characters!' % (n,Q[n],n,Q[n])
  print


print Q[N]

它在 ABCDABCDABCDBCD 上输出的最后几行是这样的:

Q[13] = 7        I can encode first 13 characters of S in only 7 characters!

P[14,1] = 9      I can encode first 14 characters of S in only 9 characters if I use my solution for Q[13] with a new block 1C
P[14,2] = 8      I can encode first 14 characters of S in only 8 characters if I use my solution for Q[12] with a new block 1BC
P[14,3] = 13     I can encode first 14 characters of S in only 13 characters if I use my solution for Q[11] with a new block 1DBC
P[14,4] = 13     I can encode first 14 characters of S in only 13 characters if I use my solution for Q[10] with a new block 1CDBC
P[14,5] = 13     I can encode first 14 characters of S in only 13 characters if I use my solution for Q[9] with a new block 1BCDBC
P[14,6] = 12     I can encode first 14 characters of S in only 12 characters if I use my solution for Q[8] with a new block 1ABCDBC
P[14,7] = 16     I can encode first 14 characters of S in only 16 characters if I use my solution for Q[7] with a new block 1DABCDBC
P[14,8] = 16     I can encode first 14 characters of S in only 16 characters if I use my solution for Q[6] with a new block 1CDABCDBC
P[14,9] = 16     I can encode first 14 characters of S in only 16 characters if I use my solution for Q[5] with a new block 1BCDABCDBC
P[14,10] = 16    I can encode first 14 characters of S in only 16 characters if I use my solution for Q[4] with a new block 1ABCDABCDBC
P[14,11] = 16    I can encode first 14 characters of S in only 16 characters if I use my solution for Q[3] with a new block 1DABCDABCDBC
P[14,12] = 16    I can encode first 14 characters of S in only 16 characters if I use my solution for Q[2] with a new block 1CDABCDABCDBC
P[14,13] = 16    I can encode first 14 characters of S in only 16 characters if I use my solution for Q[1] with a new block 1BCDABCDABCDBC
P[14,14] = 15    I can encode first 14 characters of S in only 15 characters if I use my solution for Q[0] with a new block 1ABCDABCDABCDBC

Q[14] = 8        I can encode first 14 characters of S in only 8 characters!

P[15,1] = 10     I can encode first 15 characters of S in only 10 characters if I use my solution for Q[14] with a new block 1D
P[15,2] = 10     I can encode first 15 characters of S in only 10 characters if I use my solution for Q[13] with a new block 1CD
P[15,3] = 11     I can encode first 15 characters of S in only 11 characters if I use my solution for P[12,3] with BCD's count incremented
P[15,3] = 9      I can encode first 15 characters of S in only 9 characters if I use my solution for Q[12] with a new block 1BCD
P[15,4] = 14     I can encode first 15 characters of S in only 14 characters if I use my solution for Q[11] with a new block 1DBCD
P[15,5] = 14     I can encode first 15 characters of S in only 14 characters if I use my solution for Q[10] with a new block 1CDBCD
P[15,6] = 14     I can encode first 15 characters of S in only 14 characters if I use my solution for Q[9] with a new block 1BCDBCD
P[15,7] = 13     I can encode first 15 characters of S in only 13 characters if I use my solution for Q[8] with a new block 1ABCDBCD
P[15,8] = 17     I can encode first 15 characters of S in only 17 characters if I use my solution for Q[7] with a new block 1DABCDBCD
P[15,9] = 17     I can encode first 15 characters of S in only 17 characters if I use my solution for Q[6] with a new block 1CDABCDBCD
P[15,10] = 17    I can encode first 15 characters of S in only 17 characters if I use my solution for Q[5] with a new block 1BCDABCDBCD
P[15,11] = 17    I can encode first 15 characters of S in only 17 characters if I use my solution for Q[4] with a new block 1ABCDABCDBCD
P[15,12] = 17    I can encode first 15 characters of S in only 17 characters if I use my solution for Q[3] with a new block 1DABCDABCDBCD
P[15,13] = 17    I can encode first 15 characters of S in only 17 characters if I use my solution for Q[2] with a new block 1CDABCDABCDBCD
P[15,14] = 17    I can encode first 15 characters of S in only 17 characters if I use my solution for Q[1] with a new block 1BCDABCDABCDBCD
P[15,15] = 16    I can encode first 15 characters of S in only 16 characters if I use my solution for Q[0] with a new block 1ABCDABCDABCDBCD

Q[15] = 9        I can encode first 15 characters of S in only 9 characters!

关于algorithm - 寻找最小长度 RLE，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/2261318/

文章推荐： performance - 请求有关快速光线追踪算法的资源

文章推荐： algorithm - 使用网络摄像头跟踪手势

delphi - 寻找 EOutOfResources
题: 是否有一种简单的方法可以获取正在运行的应用程序中泄漏的资源类型列表？ IOW 通过连接到应用程序？我知道 memproof 可以做到，但它会减慢速度，以至于应用程序甚至无法持续一分钟。大多数任
c# - 寻找.net核心的容器的stdout和stderr
正确地说下面的代码会将自定义日志发送到.net核心中的Docker容器的stdout和stderr吗？ console.Writeline(...) console.error(..) 最佳答案如果
java - 寻找 for 循环中动态变量声明的解决方法
我想将一个任务多次重复，放入 for 循环中。我必须将时间序列对象存储为 IExchangeItem ， openDA 中的一个特殊类(数据同化软件)。这是任务之一(有效): HashMap ite
c++ - 寻找，相对或绝对位置？
我需要从文件中读取一个数组。该数组在文件中不是连续排序的，必须跳转“偏移”字节才能获得下一个元素。假设我读取一个非常大的文件，什么更有效率。 1) 使用增量相对位置。 2)使用绝对位置。选项 1:
c++ - 寻找 MSIHANDLE
我有一个安装程序(使用 Advanced Installer 制作)。我有一个必须与之交互的应用程序，但我不知道如何找到该安装的 MSIHANDLE。我查看了 Microsoft 引用资料，但没有发现
javascript - 寻找。正则表达式中的字符问题
我在替换正则表达式中的“joe.”等内容时遇到问题。这是代码 var objects = new Array("joe","sam"); code = "joe.id was here so was
c++ - 寻找 child
我有 A 类。A 类负责管理 B 对象的生命周期，它包含 B 对象的容器，即 map。，每个 B 对象都包含 C 对象的容器，即 map .我有一个全局 A 对象用于整个应用程序。我有以下问题:我
android - 寻找 FreeImage.so
任何人都可以告诉我在哪里可以找到 freeImage.so 吗？我一直在努力寻找相同的东西但没有成功..任何帮助将不胜感激。我已经尝试将 freeimage.a 转换为 freeImage .so 并
python - 寻找 assertURLEquals
在单元测试期间，我想将生成的 URL 与测试中定义的静态 URL 进行比较。对于此比较，最好有一个 TestCase.assertURLEqual 或类似的，它可以让您比较两个字符串格式的 URL，如
c++ - “寻找”优化
'find ./ -name *.jpg' 我正在尝试优化上述语句的“查找”命令。在查找实现中处理“-name”谓词的方法。 static boolean pred__name __common (
python - 寻找()函数？
请原谅我在这里的困惑，但我已经阅读了关于 python 中的 seek() 函数的文档(在不得不使用它之后)，虽然它帮助了我，但我仍然对它的实际含义有点困惑，任何非常感谢您的解释，谢谢。最佳答案关
c# - 寻找 boolean 语句的解释
我在我正在使用的库中找到了这个语句。它应该检查集群中的当前节点是否是领导者。这是语句:(!(cluster.Leader?.IsRemote ?? true)) 为什么不直接使用 (cluster.L
java - 寻找 JsonParser 依赖
我发现 JsonParser 在 javax.json.stream 中，但我不知道在哪里可以找到它。谁能帮帮我？ https://docs.oracle.com/javaee/7/api/javax
security - 寻找 Web 服务安全漏洞的真实故事
关闭。这个问题需要更多focused .它目前不接受答案。想改善这个问题吗？更新问题，使其仅关注一个问题 editing this post . 6年前关闭。 Improve this questi
jenkins - 寻找 Jenkins 插件以允许每个分支的默认参数值
如果 git 存储库中有新的更改可用，我有一个多分支管道作业设置为每分钟由 Jenkinsfile 构建。如果分支名称是某种格式，我有一个将工件部署到环境的步骤。我希望能够在每个分支的基础上配置环境，
uml - 寻找 Harel 状态图工具
关闭。这个问题不满足Stack Overflow guidelines .它目前不接受答案。想改善这个问题吗？更新问题，使其成为 on-topic对于堆栈溢出。 6年前关闭。 Improve thi
coldfusion - 寻找 cfdump 的替代方案
我想我刚刚意识到当他们不让我使用 cfdump 时我的网络主机是多么的限制。这其实有点让我生气，真的，dump 有什么害处？无论如何，我的问题是是否有人编写了一个 cfdump 替代方案来剔除复杂类型
rest - 寻找 RESTful 方法来更新具有相同字段集的多个资源
任务:我有多个资源需要在一个 HTTP 调用中更新。要更新的资源类型、字段和值对于所有资源都是相同的。示例:通过 ID 设置了一组汽车，需要将所有汽车的“状态”更新为“已售出”。经典 RESTF
sql - 寻找 SQL 中的性能改进
场景:表中有 2 列，数据如下例所示。对于“a”列的相同值，该表可能有多个行。在示例中，考虑到“a”列，“1”有三行，“2”有一行。示例表“t1”: |a|b ||1|1.1||1|1.2||1
python - 寻找 Pandas 最长的连续增长
我有一个数据框: Date Price 2021-01-01 29344.67 2021-01-02 32072.08 2021-01-03 33048.03 2021-01-04 32084.

塔克拉玛干

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

algorithm - 寻找最小长度 RLE