- android - RelativeLayout 背景可绘制重叠内容
- android - 如何链接 cpufeatures lib 以获取 native android 库?
- java - OnItemClickListener 不起作用,但 OnLongItemClickListener 在自定义 ListView 中起作用
- java - Android 文件转字符串
我目前正在为一个基于我的语言的短文本压缩项目工作。但作为初学者,我也知道一些基本的压缩算法,比如 LZW。但是我还是不明白smaz作品。我有两个问题:
谁能帮我解释一下?
非常感谢。
最佳答案
试着回答你的问题
smaz 是如何工作的?根据[1] ,
Smaz has a hard-wired constant built-in codebook of 254 common English words, word fragments, bigrams, and the lowercase letters (except j, k, q). The inner loop of the Smaz decoder is very simple:
- Fetch the next byte X from the compressed file.
- Is X == 254? Single byte literal: fetch the next byte L, and pass it straight through to the decoded text.
- Is X == 255? Literal string: fetch the next byte L, then pass the following L+1 bytes straight through to the decoded text.
- Any other value of X: lookup the X'th "word" in the codebook (that "word" can be from 1 to 5 letters), and copy that word to the decoded text.
- Repeat until there are no more compressed bytes left in the compressed file.
Because the codebook is constant, the Smaz decoder is unable to "learn" new words and compress them, no matter how often they appear in the original text.
这page可能有助于理解代码。
如何构建密码本和反向密码本? TODO存储库和作者中的文件 comments在 redit 中,字典是由未发布的 ruby 脚本生成的。另外,作者解释说:
btw what the Ruby program does is to consider all the possible substrings, and even all the possible separated words, and build a table of frequencies, than adjust the weight based on the string length, and finally hand tuning the table to compress specific things very well. I added by hand the "http://" and ".com" token for example, removing the final two entries.
您的项目的替代方案可以是 shoco library它支持根据您的语言生成自定义压缩模型。
关于algorithm - smaz 压缩库如何工作?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33331552/
我目前正在为一个基于我的语言的短文本压缩项目工作。但作为初学者,我也知道一些基本的压缩算法,比如 LZW。但是我还是不明白smaz作品。我有两个问题: smaz 是如何运作的? 如何构建密码本和反向密
我是一名优秀的程序员,十分优秀!