algorithm - 我们是否必须创建一个所有节点都有 3 个子节点的树？-6ren

algorithm - 我们是否必须创建一个所有节点都有 3 个子节点的树？

转载作者：行者123 更新时间：2023-12-02 15:28:21

构建霍夫曼树的步骤输入是唯一字符及其出现频率的数组，输出是霍夫曼树。

为每个唯一字符创建一个叶子节点，并构建一个所有叶子节点的最小堆(Min Heap用作优先级队列。频率字段的值用于比较最小堆中的两个节点。最初，出现频率最低的字符在根)
从最小堆中提取频率最小的两个节点。
创建一个新的内部节点，其频率等于两个节点频率的总和。将第一个提取的节点作为其左子节点，将另一个提取的节点作为其右子节点。将此节点添加到最小堆。
重复步骤#2 和#3，直到堆只包含一个节点。剩下的节点就是根节点，树就完成了。

在堆中，一个节点最多可以有 2 个子节点，对吧？

因此，如果我们想在三元系统(即使用符号 0 、 1 和 2 的编码词)中推广霍夫曼算法，我们可以做什么？我们是否必须创建一个所有节点都有 3 个子节点的树？

编辑:

我认为应该是这样的。

构建霍夫曼树的步骤输入是唯一字符及其出现频率的数组，输出是哈夫曼树。

为每个唯一字符创建一个叶节点，并构建所有叶节点的最小堆
从最小堆中提取频率最小的三个节点。
创建一个新的内部节点，其频率等于三个节点频率的总和。将提取的第一个节点作为其左子节点，将第二个提取的节点作为其中间子节点，将第三个提取的节点作为其右子节点。将此节点添加到最小堆。
重复步骤#2 和#3，直到堆只包含一个节点。剩下的节点就是根节点，树就完成了。

我们如何证明该算法产生最佳三元码？

编辑 2:假设我们有频率 5、9、12、13、16、45。

它们的数量是偶数，所以我们添加一个频率为 0 的虚拟节点。我们是否将其放在数组的末尾？那么，会不会是这样呢？

enter image description here

那么会不会有下面的堆呢？

enter image description here

然后:

enter image description here

然后:

enter image description here

还是我理解错了？

最佳答案

是的!您必须创建具有 3 个子节点的所有节点。为什么是3？你也可以有 n-ary huffman coding使用带有 n 个子节点的节点。这棵树看起来像这样-(对于 n=3)

    *
  / | \
 *  *  *
   /|\
  * * *

三元码字的哈夫曼算法

为了便于引用，我给出了算法。

HUFFMAN_TERNARY(C)
{
   IF |C|=EVEN 
   THEN ADD DUMMY CHARACTER Z WITH FREQUENCY 0.

   N=|C|
   Q=C;  //WE ARE BASICALLY HEAPIFYING THE CHARACTERS

   FOR I=1 TO floor(N/2)
   {
     ALLOCATE NEW_NODE;
     LEFT[NEW_NODE]= U= EXTRACT_MIN(Q)
     MID[NEW_NODE] = V= EXTRACT_MIN(Q)
     RIGHT[NEW_NODE]=W= EXTRACT_MIN(Q)
     F[NEW_NODE]=F[U]+F[V]+F[W];
     INSERT(Q,NEW_NODE);
   }
   RETURN EXTRACT_MIN(Q);
  } //END-OF-ALGO

Why are we adding extra nodes? To make the number of nodes odd.(Why?) Because we want to get out of the for loop with just one node in Q.

Why floor(N/2)? At first we take 3 nodes. Then replace with it 1 node.There are N-2 nodes. After that we always take 3 nodes (if not available 1 node,it is never possible to get 2 nodes because of the dummy node) and replace with 1. In each iteration we are reducing it by 2 nodes. So that's why we are using the term floor(N/2).

使用一些示例字符集在纸上自行检查。你会明白的。

正确性

我在这里引用了 Cormen, Rivest 的“Introduction to Algorithms”。

证明:一步一步的数学证明太长，无法在这里发布，但它与书中给出的证明非常相似。

想法

Any optimal tree has the lowest three frequencies at the lowest level.(We have to prove this).(using contradiction) Suppose it is not the case then we could switch a leaf with a higher frequency from the lowest level with one of the lowest three leaves and obtain a lower average length. Without any loss of generality, we can assume that all the three lowest frequencies are the children of the same node. if they are at the same level, the average length does not change irrespective of where the frequencies are). They only differ in the last digit of their codeword (one will be 0,1 or 2).

Again as the binary codewords we have to contract the three nodes and make a new character out of it having frequency=total of three node's(character's) frequencies. Like binary Huffman codes, we see that the cost of the optimal tree is the sum of the tree with the three symbols contracted and the eliminated sub-tree which had the nodes before contraction. Since it has been proved that the sub-tree has to be present in the final optimal tree, we can optimize on the tree with the contracted newly created node.

例子

假设字符集包含频率5,9,12,13,16,45。
现在 N=6-> 偶数。所以添加 freq=0 的虚拟字符
现在N=7，C中的freq为0,5,9,12,13,16,45
现在使用 min priority queue得到 3 个值。 0然后 5然后 9 .
添加它们，在优先级队列中插入 freq=0+9+5 的新字符。这样继续。

树会变成这样

         100
        / | \
       /  |  \
      /   |   \
     39   16  45  step-3
   /  | \
  14  12 13  step-2
/ | \  
0 5 9   step-1

最终证明

我现在将直接模拟 Cormen 的证明。

引理1.设C是一个字母表，其中属于C的每个字符d的频率为c.freq。让x , y 和 z 是 C 中频率最低的三个字符。那么存在C 的最佳前缀代码，其中 x 、y 和 z 的代码字具有相同的长度，仅在最后一位不同。

证明:

想法

首先考虑生成任意最优前缀码的任意树T。
然后我们将对其进行修改，使其成为代表另一个最佳前缀的树，使得字符 x、y、z 在最大深度处显示为兄弟节点。
如果我们可以构建这样一棵树，那么 x、y 和 z 的代码字将具有相同的长度，仅在最后一位不同。

证明--

令 a,b,c 为 T 中最大深度的同级叶子的三个字符。

Without loss of generality, we assume that a.freq < b:freq < c.freq and x.freq < y.freq < z.freq. Since x.freq and y.freq and z.freq are the 3 lowest leaf frequencies, in order (means there are no frequencies between them) and a.freq , b.freq and c.freq are two arbitrary frequencies, in order, we have x.freq < a:freq and y.freq < b.freq and z.freq< c.freq.

In the remainder of the proof we can have x.freq=a.freq or y.freq=b.freq or z.freq=c.freq. But if x.freq=b.freq or x.freq=c.freq or y.freq=c.freq then all of them are same. WHY?? Let's see. Suppose x!=y,y!=z,x!=z but z=c and x<y<z in order and aa<b<c. Also x!=a. --> x<a y!=b. --> y<b z!=c. --> z<c but z=c is given. This contradicts our assumption. (Thus proves). The lemma would be trivially true. Thus we will assume x!=b and x!=c.

 T1

     *                       |
   / | \                     |
  *  *  x                    +---d(x)
/ | \                        |
y *  z                           +---d(y) or d(z)
 /|\                         |
a b c                        +---d(a) or d(b) or d(c) actually d(a)=d(b)=d(c)

T2
     *              
   / | \
  *  *  a
/ | \
y *  z
 /|\
x b c               

T3

     *              
   / | \
  *  *  x
/ | \
b *  z
 /|\
x y c

T4
     *              
   / | \
  *  *  a
/ | \
b *  c
 /|\
x y z


In case of T1 costt1= x.freq*d(x)+ cost_of_other_nodes + y.freq*d(y) + z.freq*d(z) + d(a)*a.freq + b.freq*d(b) + c.freq*d(c)
In case of T2 costt2= x.freq*d(a)+ cost_of_other_nodes + y.freq*d(y) + z.freq*d(z) + d(x)*a.freq + b.freq*d(b) + c.freq*d(c)
       costt1-costt2= x.freq*[d(x)-d(a)]+0             +    0        +     0       + a.freq[d(a)-d(x)]+0       +    0
                    = (a.freq-x.freq)*(d(a)-d(x))
                    >= 0      
So costt1>=costt2.                    --->(1)

Similarly we can show costt2 >= costt3--->(2)

And                   costt3 >= costt4--->(3)

From (1),(2) and (3) we get 

costt1>=costt4.-->(4)

But T1 is optimal.

So costt1<=costt4 -->(5)

From (4) and (5) we get costt1=costt2.

SO, T4 is an optimal tree in which x,y,and z appears as sibling leaves at maximum depth, from which the lemma follows.

引理 2

Let C be a given alphabet with frequency c.freq defined for each character c belonging to C. Let x , y, z be three characters in C with minimum frequency. Let C1 be the alphabet C with the characters x and y removed and a new character z1 added, so that C1 = C - {x,y,z} union {z1}. Define f for C1 as for C, except that z1.freq=x.freq+y.freq+z.freq. Let T1 be any tree representing an optimal prefix code for the alphabet C1. Then the tree T , obtained from T1 by replacing the leaf node for z with an internal node having x , y and z as children, represents an optimal prefix code for the alphabet C.

证明:

看，我们正在从 T1-> T 进行转换。所以我们必须找到一种方法来表达 T，即用 costt1 表示 costt。

     *                            *              
   / | \                        / | \
  *  *  *                      *  *  *
/ | \                           / | \
* *  *        ---->             * z1 *
 /|\                            
x y z

对于属于(C-{x,y,z})的c，dT(c)=dT1(c)。 [深度对应T和T1树]

因此 c.freq*dT(c)=c.freq*dT1(c) .

自 dT(x)=dT(y)=dT(z)=dT1(z1)+1

So we have `x.freq*dT(x)+y.freq*dT(y)+z.freq*dT(z)=(x.freq+y.freq+z.freq)(dT1(z)+1)`
                                                  = `z1.freq*dT1(z1)+x.freq+y.freq+z.freq`
Adding both side the cost of other nodes which is same in both T and T1.
x.freq*dT(x)+y.freq*dT(y)+z.freq*dT(z)+cost_of_other_nodes= z1.freq*dT1(z1)+x.freq+y.freq+z.freq+cost_of_other_nodes

So costt=costt1+x.freq+y.freq+z.freq
or equivalently

  costt1=costt-x.freq-y.freq-z.freq ---->(1)

现在我们用反证法证明引理。

我们现在用反证法证明引理。假设 T 不代表C 的最佳前缀代码。然后存在最佳树 T2，使得 costt2 < costt .不失一般性(根据引理 1)，T2 有 x 和 y 和 z 作为 sibling 。设 T3 是树 T2，其中 x 和 y 和 z 的共同父代被替换为 a叶 z1 与频率 z1.freq=x.freq+y.freq+z.freq然后

costt3 = costt2-x.freq-y.freq-z.freq
       <  costt-x.freq-y.freq-z.freq
       =  costt1  (From 1)

与 T1 代表最佳前缀码的假设相矛盾对于 C1。因此，T必须代表字母表C的最佳前缀码。

-Proved.

程序 HUFFMAN 产生最佳前缀代码。证明:直接来自引理 1 和 2。

注意.: 术语来自Introduction to Algorithms 3rd edition Cormen Rivest

关于algorithm - 我们是否必须创建一个所有节点都有 3 个子节点的树？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/29200112/

文章推荐： java - 大整数的底除法和欧几里得除法

文章推荐： angularjs - Angular JS 智能表筛选集合

文章推荐： oracle - 如何使用 RECORD TYPE 作为 OUT 参数执行存储过程

文章推荐：用于嵌入 iframe 的 Dropbox 文件链接

树结构之MongoDb 使用的到底是 B 树，还是 B+ 树？
关于 B 树与 B+ 树，网上有一个比较经典的问题：为什么 MongoDb 使用 B 树，而 MySQL 索引使用 B+ 树? 但实际上 MongoDb 真的用的是 B 树吗?
c# - 持久(基于磁盘)R 树(或 R* 树)
如何将 R* Tree 实现为持久(基于磁盘)树？保存 R* 树索引或保存叶值的文件的体系结构是什么？注意:此外，如何在这种持久性 R* 树中执行插入、更新和删除操作？注意事项二:我已经实现了一个
java - 给定另一个 AST 树，在 Java 中创建一个 AST 树
目前，我正在努力用 Java 表示我用 SML 编写的 AST 树，这样我就可以随时用 Java 遍历它。我想知道是否应该在 Java 中创建一个 Node 类，其中包含我想要表示的数据，以及一个数
c++ - C++ 中任何好的范围查询库(使用 K-D 树、四叉树或 R 树)
我之前用过这个库http://www.cs.umd.edu/~mount/ANN/ .但是，它们不提供范围查询实现。我猜是否有一个 C++ 范围查询实现(圆形或矩形)，用于查询二维数据。谢谢。最佳
为什么MySQL数据库索引选择使用B+树?
在进一步分析为什么MySQL数据库索引选择使用B+树之前，我相信很多小伙伴对数据结构中的树还是有些许模糊的，因此我们由浅入深一步步探讨树的演进过程，在一步步引出B树以及为什么MySQL数据库索引选择
操作系统的那棵“树”---06
操作系统的那棵“树” 今天从一颗开始，我们看看如何从小树苗长成一颗苍天大树。运转CPU CPU运转起来很简单，就是不断的从内存取值执行。 CPU没有好好运转 IO是个耗费时间的活，如果CPU在取值
r - 从物种列表制作简单的系统发育树状图(树)
我想为海洋生物学类(class)制作一个简单的系统发育树作为教育示例。我有一个具有分类等级的物种列表: Group <- c("Benthos","Benthos","Benthos","Be
c++ - 树，无法正确删除节点
我从这段代码中删除节点时遇到问题，如果我插入数字 12 并尝试删除它，它不会删除它，我尝试调试，似乎当它尝试删除时，它出错了树的。但是，如果我尝试删除它已经插入主节点的节点，它将删除它，或者我插入数字
haskell - 如何在Haskell中实现B+树？
B+ 树的叶节点链接在一起。将 B+ 树的指针结构视为有向图，它不是循环的。但是忽略指针的方向并将其视为链接在一起的无向叶节点会在图中创建循环。在 Haskell 中，如何将叶子构造为父内部节点的子
GWT 树，开幕事件
我在 GWT 中使用树控件。我有一个自定义小部件，我将其添加为 TreeItem: Tree testTree = new Tree(); testTree.addItem(myWidget); 我想
c - 树/链表结构的遍历
它有点像混合树/链表结构。这是我定义结构的方式 struct node { nodeP sibling; nodeP child; nodeP parent; char
c - 树:使用队列进行层序遍历
我编写了使用队列遍历树的代码，但是下面的出队函数生成错误，head = p->next 是否有问题？我不明白为什么这部分是错误的。 void Levelorder(void) { node *tmp,
javascript - 将平面数组解析为嵌套结构(树)
例如，我想解析以下数组: var array1 = ["a.b.c.d", "a.e.f.g", "a.h", "a.i.j", "a.b.k"] 进入: var json1 = { "nod
java - 树-路径总和
问题 -> 给定一棵二叉树和一个和，确定该树是否具有从根到叶的路径，使得沿路径的所有值相加等于给定的和。我的解决方案 -> public class Solution { public bo
带有列的 Java 树
我有一个创建 java 树的任务，它包含三列:运动名称、运动类别中的运动计数和上次更新。类似的东西显示在下面的图像上: 如您所见，有 4 种运动:水上运动、球类运动、跳伞运动和舞蹈运动。当我展开 sk
mysql - H2数据库中的B+树
我想在 H2 数据库中实现 B+ Tree，但我想知道，B+ Tree 功能在 H2 数据库中可用吗？最佳答案 H2 已经使用了 B+ 树(PageBtree 类)。关于mysql - H2数据库
java - 字符串数组(树)
假设我们有 5 个字符串数组: String[] array1 = {"hello", "i", "cat"}; String[] array2 = {"hello", "i", "am"}; Str
每个节点具有恒定时间的 C++ 树
我正在处理树。每个节点都有带有 Tree * 值的对象。我读取的数据如下所示: 1 2 2 ... 这意味着，将 1 作为 0 的子节点，将 2 作为 1 的子节点，将 3 作为 o 2 的子节点。在
JavaScript 树/树网格库
我正在寻找一个好的 JavaScript 树/树网格包。现在——在你回答之前: 它需要能够在大量节点上正常运行。可能有 1,000 个兄弟节点。它需要能够在 2 或 3 秒内绘制到 1,000 个节点
c++ - 这个代码块到底是如何工作的[树]？
下面的代码块究竟是如何工作的？更具体地说，程序如何知道返回哪个选项？ return ancestor (node1->left(), node2) || ancestor

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城