c++ - LZW减压-6ren

c++ - LZW减压

转载作者：行者123 更新时间：2023-12-02 10:02:12

我正在C++中实现LZW算法。

字典的大小是用户输入的，但最小值为256，因此它应适用于二进制文件。如果到达字典的末尾，它将绕到索引0，然后从那里开始覆盖它。

例如，如果我放入alice in wonderland script并使用字典大小512对其进行压缩，则会得到this dictionary。

但是我对解压缩和从解压缩压缩文件looks like this的输出字典有问题。

我的解压缩代码如下所示

struct dictionary
{
    vector<unsigned char> entry;
    vector<bool> bits;
};

void decompress(dictionary dict[], vector<bool> file, int dictionarySize, int numberOfBits)
{
    //in this example
    //dictionarySize = 512, tells the max size of the dictionary, and goes back to 0 if it reaches 513
    //numberOfBits = log2(512) = 9
    //dictionary dict[] contains bits and strings (strings can be empty)
    // dict[0] = 
    //            entry = (unsigned char)0
    //            bits = (if numberOfBits = 9) 000000001
    // dict[255] = 
    //            entry = (unsigned char)255
    //            bits = (if numberOfBits = 9) 011111111
    // so the next entry will be dict[next] (next is currently 256)
    // dict[256] = 
    //            entry = what gets added in the code below
    //            bits = 100000000
    // all the bits are already set previously (dictionary size is int dictionarySize) so in this case all the bits from 0 to 511 are already set, entries are set from 0 to 255, so extended ASCII


    vector<bool> currentCode;
    vector<unsigned char> currentString;
    vector<unsigned char> temp;

    int next=256;
    bool found=false;

    for(int i=0;i<file.size();i+=numberOfBits)
    {
        for(int j=0;j<numberOfBits;j++)
        {
            currentCode.push_back(file[i+j]);
        }

        for(int j=0;j<dictionarySize;j++)
        {
            // when the currentCode (size numberOfBits) gets found in the dictionary
            if(currentCode==dict[j].bits)
            {
                currentString = dict[j].entry;

                // if the current string isnt empty, then it means it found the characted in the dictionary
                if(!currentString.empty())
                {
                    found = true;
                }
            }
        }

        //if the currentCode in the dictionary has a string value attached to it
        if(found)
        {
            for(int j=0;j<currentString.size();j++)
            {
                cout<<currentString[j];
            }

            temp.push_back(currentString[0]);

            // so it doesnt just push 1 character into the dictionary
            // example, if first read character is 'r', it is already in the dictionary so it doesnt get added 
            if(temp.size()>1)
            {
                // if next is more than 511, writing to that index would cause an error, so it resets back to 0 and goes back up
                if(next>dictionarySize-1) //next > 512-1
                {
                    next = 0;
                }
                dict[next].entry.clear();
                dict[next].entry = temp;
                next++;
            }

            //temp = currentString;
        }
        else
        {
            currentString = temp;
            currentString.push_back(temp[0]);

            for(int j=0;j<currentString.size();j++)
            {
                cout<<currentString[j];
            }

            // if next is more than 511, writing to that index would cause an error, so it resets back to 0 and goes back up
            if(next>dictionarySize-1)
            {
                next = 0;
            }
            dict[next].entry.clear();
            dict[next].entry = currentString;
            next++;

            //break;
        }

        temp = currentString;

        // currentCode gets cleared, and written into in the next iteration
        currentCode.clear();

        //cout<<endl;
        found = false;
    }
}

我目前陷入困境，不知道该怎么解决以修复输出。
我还注意到，如果我把字典放的足够大，那么它就不会绕过字典(它不会到达末尾并从0开始重新开始)，它可以工作。

最佳答案

开始小

您使用的文件中有太多数据需要调试。从字符串开始。我从Wikli举了一个很好的例子:

Input: "abacdacacadaad"

step    input           match   output  new_entry   new_index
                                        a           0
                                        b           1
                                        c           2
                                        d           3
1       abacdacacadaad  a       0       ab          4
2       bacdacacadaad   b       1       ba          5
3       acdacacadaad    a       0       ac          6
4       cdacacadaad     c       2       cd          7
5       dacacadaad      d       3       da          8
6       acacadaad       ac      6       aca         9
7       acadaad         aca     9       acad        10
8       daad            da      8       daa         11
9       ad              a       0       ad          12
10      d               d       3       

Output: "0102369803"

因此，您可以通过交叉匹配输入/输出和字典内容来逐步调试代码。一旦正确完成，就可以对解码进行相同的操作:

Input: "0102369803"

step    input   output  new_entry   new_index
                        a           0
                        b           1
                        c           2
                        d           3
1       0       a       
2       1       b       ab          4
3       0       a       ba          5
4       2       c       ac          6
5       3       d       cd          7
6       6       ac      da          8
7       9       aca     aca         9
8       8       da      acad        10
9       0       a       daa         11
10      3       d       ad          12

Output: "abacdacacadaad"

然后，移至文件并清除字典处理。

比特流

成功完成小写字母的LZW后，您可以尝试使用完整的字母和位编码。您知道LZW流可以以任何位长编码(不仅仅是8/16/32/64位)，这可以极大地影响压缩率(就使用的数据属性而言)。因此，我将尝试对变量(或预定义的位长)的数据进行统一访问。

有点好奇，所以我为压缩编码了一个简单的C++ / VCL示例:

//---------------------------------------------------------------------------
// LZW
const int LZW_bits=12;              // encoded bitstream size
const int LZW_size=1<<LZW_bits;     // dictinary size
// bitstream R/W
DWORD bitstream_tmp=0;
//---------------------------------------------------------------------------
// return LZW_bits from dat[adr,bit] and increment position (adr,bit)
DWORD bitstream_read(BYTE *dat,int siz,int &adr,int &bit,int bits)
    {
    DWORD a=0,m=(1<<bits)-1;
    // save tmp if enough bits
    if (bit>=bits){ a=(bitstream_tmp>>(bit-bits))&m; bit-=bits; return a; }
    for (;;)
        {
        // insert byte
        bitstream_tmp<<=8;
        bitstream_tmp&=0xFFFFFF00;
        bitstream_tmp|=dat[adr]&255;
        adr++; bit+=8;
        // save tmp if enough bits
        if (bit>=bits){ a=(bitstream_tmp>>(bit-bits))&m; bit-=bits; return a; }
        // end of data
        if (adr>=siz) return 0;
        }
    }
//---------------------------------------------------------------------------
// write LZW_bits from a to dat[adr,bit] and increment position (adr,bit)
// return true if buffer is full
bool bitstream_write(BYTE *dat,int siz,int &adr,int &bit,int bits,DWORD a)
    {
    a<<=32-bits;        // align to MSB
    // save tmp if aligned
    if ((adr<siz)&&(bit==32)){ dat[adr]=(bitstream_tmp>>24)&255; adr++; bit-=8; }
    if ((adr<siz)&&(bit==24)){ dat[adr]=(bitstream_tmp>>16)&255; adr++; bit-=8; }
    if ((adr<siz)&&(bit==16)){ dat[adr]=(bitstream_tmp>> 8)&255; adr++; bit-=8; }
    if ((adr<siz)&&(bit== 8)){ dat[adr]=(bitstream_tmp    )&255; adr++; bit-=8; }
    // process all bits of a
    for (;bits;bits--)
        {
        // insert bit
        bitstream_tmp<<=1;
        bitstream_tmp&=0xFFFFFFFE;
        bitstream_tmp|=(a>>31)&1;
        a<<=1; bit++;
        // save tmp if aligned
        if ((adr<siz)&&(bit==32)){ dat[adr]=(bitstream_tmp>>24)&255; adr++; bit-=8; }
        if ((adr<siz)&&(bit==24)){ dat[adr]=(bitstream_tmp>>16)&255; adr++; bit-=8; }
        if ((adr<siz)&&(bit==16)){ dat[adr]=(bitstream_tmp>> 8)&255; adr++; bit-=8; }
        if ((adr<siz)&&(bit== 8)){ dat[adr]=(bitstream_tmp    )&255; adr++; bit-=8; }
        }
    return (adr>=siz);
    }
//---------------------------------------------------------------------------
bool str_compare(char *s0,int l0,char *s1,int l1)
    {
    if (l1<l0) return false;
    for (;l0;l0--,s0++,s1++)
     if (*s0!=*s1) return false;
    return true;
    }
//---------------------------------------------------------------------------
AnsiString LZW_encode(AnsiString raw)
    {
    AnsiString lzw="";
    int i,j,k,l;
    int adr,bit;
    DWORD a;
    const int siz=32;                   // bitstream buffer
    BYTE buf[siz];
    AnsiString dict[LZW_size];          // dictionary
    int dicts=0;                        // actual size of dictionary

    // init dictionary
    for (dicts=0;dicts<256;dicts++) dict[dicts]=char(dicts);    // full 8bit binary alphabet
//  for (dicts=0;dicts<4;dicts++) dict[dicts]=char('a'+dicts);  // test alphabet "a,b,c,d"

    l=raw.Length();
    adr=0; bit=0;
    for (i=0;i<l;)
        {
        i&=i;
        // find match in dictionary
        for (j=dicts-1;j>=0;j--)
         if (str_compare(dict[j].c_str(),dict[j].Length(),raw.c_str()+i,l-i))
            {
            i+=dict[j].Length();
            if (i<l)    // add new entry in dictionary (if not end of input)
                {
                // clear dictionary if full
                if (dicts>=LZW_size) dicts=256; // full 8bit binary alphabet
//              if (dicts>=LZW_size) dicts=4;   // test alphabet "a,b,c,d"
                else{
                    dict[dicts]=dict[j]+AnsiString(raw[i+1]); // AnsiString index starts from 1 hence the +1
                    dicts++;
                    }
                }
            a=j; j=-1; break;       // full binary output
//          a='0'+j; j=-1; break;   // test ASCII output
            }
        // store result to bitstream
        if (bitstream_write(buf,siz,adr,bit,LZW_bits,a))
            {
            // append buf to lzw
            k=lzw.Length();
            lzw.SetLength(k+adr);
            for (j=0;j<adr;j++) lzw[j+k+1]=buf[j];
            // reset buf
            adr=0;
            }
        }
    if (bit)
        {
        // store the remainding bits with zeropad
        bitstream_write(buf,siz,adr,bit,LZW_bits-bit,0);
        }
    if (adr)
        {
        // append buf to lzw
        k=lzw.Length();
        lzw.SetLength(k+adr);
        for (j=0;j<adr;j++) lzw[j+k+1]=buf[j];
        }
    return lzw;
    }
//---------------------------------------------------------------------------
AnsiString LZW_decode(AnsiString lzw)
    {
    AnsiString raw="";
    int adr,bit,siz,ix;
    DWORD a;
    AnsiString dict[LZW_size];          // dictionary
    int dicts=0;                        // actual size of dictionary

    // init dictionary
    for (dicts=0;dicts<256;dicts++) dict[dicts]=char(dicts);    // full 8bit binary alphabet
//  for (dicts=0;dicts<4;dicts++) dict[dicts]=char('a'+dicts);  // test alphabet "a,b,c,d"

    siz=lzw.Length();
    adr=0; bit=0; ix=-1;
    for (adr=0;(adr<siz)||(bit>=LZW_bits);)
        {
        a=bitstream_read(lzw.c_str(),siz,adr,bit,LZW_bits);
//      a-='0';                         // test ASCII input
        // clear dictionary if full
        if (dicts>=LZW_size){ dicts=4; ix=-1; }
        // new dictionary entry
        if (ix>=0)
            {
            if (a>=dicts){ dict[dicts]=dict[ix]+AnsiString(dict[ix][1]); dicts++; }
            else         { dict[dicts]=dict[ix]+AnsiString(dict[a ][1]); dicts++; }
            } ix=a;
        // update decoded output
        raw+=dict[a];
        }
    return raw;
    }
//---------------------------------------------------------------------------

并使用 // test ASCII input行输出:

txt="abacdacacadaad"
enc="0102369803"
dec="abacdacacadaad"

其中 AnsiString是我使用的唯一VCL东西，它只是自我分配的字符串变量，请注意其索引始于 1。

AnsiString s;
s[5]              // character access (1 is first character) 
s.Length()        // returns size
s.c_str()         // returns char*
s.SetLength(size) // resize

所以只要使用您得到的任何字符串...

如果您没有 BYTE,DWORD，请改用 unsigned char和 unsigned int ...

看起来它也适用于长文本(大于字典和/或位流缓冲区的大小)。但是请注意，清除可能会在几个不同的代码位置进行，但必须在编码器/解码器中都进行同步，否则清除数据后会破坏数据。

该示例可以只使用 "a,b,c,d"字母，也可以使用完整的8it字母。当前设置为8bit。如果要更改它，只需取消rem的 // test ASCII input行并取消rem的代码中的 // full 8bit binary alphabet行。

要测试交叉缓冲区和边界，您可以玩:

const int LZW_bits=12;              // encoded bitstream size
const int LZW_size=1<<LZW_bits;     // dictinary size

以及:

const int siz=32;                   // bitstream buffer

常数...也会影响效果，因此请根据自己的喜好进行调整。
当心 bitstream_write没有经过优化，可以大大提高速度...

同样，为了调试4位对齐的编码，我使用了十六进制打印编码数据(十六进制字符串是其ASCII版本的两倍)，如下所示(忽略VCL内容):

AnsiString txt="abacdacacadaadddddddaaaaaaaabcccddaaaaaaaaa",enc,dec,hex;
enc=LZW_encode(txt);
dec=LZW_decode(enc);

// convert to hex
hex=""; for (int i=1,l=enc.Length();i<=l;i++) hex+=AnsiString().sprintf("%02X",enc[i]);

mm_log->Lines->Add("\""+txt+"\"");
mm_log->Lines->Add("\""+hex+"\"");
mm_log->Lines->Add("\""+dec+"\"");
mm_log->Lines->Add(AnsiString().sprintf("ratio: %i%",(100*enc.Length()/dec.Length())));

结果:

"abacdacacadaadddddddaaaaaaaabcccddaaaaaaaaa"
"06106206106306410210510406106410FFFFFF910A10706110FFFFFFD10E06206311110910FFFFFFE11410FFFFFFD0"
"abacdacacadaadddddddaaaaaaaabcccddaaaaaaaaa"
ratio: 81%

关于c++ - LZW减压，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/62086402/

文章推荐： c++ - 如何通过CGAL生成3D网格的剖 View

文章推荐： c++ - 如何使用模板参数包实现仅限几种类型的SFINAE

文章推荐： javascript - 为不同语言和平台制作库的有效方法

c++ - C c;之间有什么区别吗？和 C c = C();?
#include using namespace std; class C{ private: int value; public: C(){ value = 0;
c++ - C 风格字符串差异 : C/C++
这个问题已经有答案了: What is the difference between char a[] = ?string?; and char *p = ?string?;? (8 个回答) 已关闭
c++ - c\c++ 转换为 C#
关闭。此题需要details or clarity 。目前不接受答案。想要改进这个问题吗？通过 editing this post 添加详细信息并澄清问题. 已关闭 7 年前。此帖子已于 8 个月
c# - C、C++、C# 的功能测试工具
除了调试之外，是否有任何针对 c、c++ 或 c# 的测试工具，其工作原理类似于将独立函数复制粘贴到某个文本框，然后在其他文本框中输入参数？最佳答案也许您会考虑单元测试。我推荐你谷歌测试和谷歌模拟
c# - C/C++/C# 在监视器上设置窗口位置
我想在第二台显示器中移动一个窗口 (HWND)。问题是我尝试了很多方法，例如将分辨率加倍或输入负值，但它永远无法将窗口放在我的第二台显示器上。关于如何在 C/C++/c# 中执行此操作的任何线索最
c# - C/C++/C#中的DES实现
我正在寻找 C/C++/C## 中不同类型 DES 的现有实现。我的运行平台是Windows XP/Vista/7。我正在尝试编写一个 C# 程序，它将使用 DES 算法进行加密和解密。我需要一些实
c# - 在条件中使用赋值是否安全？ C/C++、C#
很难说出这里要问什么。这个问题模棱两可、含糊不清、不完整、过于宽泛或夸夸其谈，无法以目前的形式得到合理的回答。如需帮助澄清此问题以便重新打开，visit the help center . 关闭 1
c++ - C/C++/C# 强制窗口在最上面
有没有办法强制将另一个窗口置于顶部？不是应用程序的窗口，而是另一个已经在系统上运行的窗口。 (Windows, C/C++/C#) 最佳答案 SetWindowPos(that_window_ha
c# - 套接字服务器应用程序的选择 : C/C++ or C#
假设您可以在 C/C++ 或 Csharp 之间做出选择，并且您打算在 Windows 和 Linux 服务器上运行同一服务器的多个实例，那么构建套接字服务器应用程序的最明智选择是什么？最佳答案如
c++ - C/C++ 运行时库和 C/C++ 标准库的区别
你们能告诉我它们之间的区别吗？顺便问一下，有什么叫C++库或C库的吗？最佳答案 C++ 标准库和 C 标准库是 C++ 和 C 标准定义的库，提供给 C++ 和 C 程序使用。那是那些词的共同
c++ - &C::c 和 &(C::c) 有什么区别？
下面的测试代码，我将输出信息放在注释中。我使用的是 gcc 4.8.5 和 Centos 7.2。 #include #include class C { public:
c++ - 什么 C++(通用 (c/c++) 与 (通用 c)/c++ )
很难说出这里问的是什么。这个问题是含糊的、模糊的、不完整的、过于宽泛的或修辞性的，无法以目前的形式得到合理的回答。如需帮助澄清此问题以便重新打开它，visit the help center 。已关
c# - 通过网络在 C/C++ 服务器、C/C++ 和 C# 客户端之间发送数据结构
我的客户将使用名为 annoucement 的结构/类与客户通信。我想我会用 C++ 编写服务器。会有很多不同的类继承annoucement。我的问题是通过网络将这些类发送给客户端我想也许我应该使用
c# - C/C++ - 如何将 Buffer.BlockCopy (C#) 转换为 C/C++
我在 C# 中有以下函数: public Matrix ConcatDescriptors(IList> descriptors) { int cols = descriptors[0].Co
c++ - C/C++ - 对其他人隐藏 C 或 C++ 函数代码
我有一个项目要编写一个函数来对某些数据执行某些操作。我可以用 C/C++ 编写代码，但我不想与雇主共享该函数的代码。相反，我只想让他有权在他自己的代码中调用该函数。是否可以？我想到了这两种方法 - 在
c# - 在托管代码(C++、C、C++/CLI、C#)中使用非托管代码时处理错误
我使用的是编写糟糕的第 3 方 (C/C++) Api。我从托管代码(C++/CLI)中使用它。有时会出现“访问冲突错误”。这使整个应用程序崩溃。我知道我无法处理这些错误[如果指针访问非法内存位置等，
c# - C#、C/C++ 或 Objective-C 中的眼动追踪库
关闭。这个问题不符合Stack Overflow guidelines .它目前不接受答案。我们不允许提问寻求书籍、工具、软件库等的推荐。您可以编辑问题，以便用事实和引用来回答。关闭 7 年前。
c++ - C/C++/Objective-C 文本识别库
已关闭。此问题不符合Stack Overflow guidelines 。目前不接受答案。要求我们推荐或查找工具、库或最喜欢的场外资源的问题对于 Stack Overflow 来说是偏离主题的，因为
c# - 将 C/C++ 函数导入 C#
我有一些 C 代码，将使用 P/Invoke 从 C# 调用。我正在尝试为这个 C 函数定义一个 C# 等效项。 SomeData* DoSomething(); struct SomeData {
c - C语言中 "c -= --c - c++;"的结果应该是什么？
这个问题已经有答案了: Why are these constructs using pre and post-increment undefined behavior? (14 个回答) 已关闭 6

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

c++ - LZW减压