gpt4 book ai didi

compression - 霍夫曼后缀码

转载 作者:行者123 更新时间:2023-12-04 08:51:06 25 4
gpt4 key购买 nike

我正在尝试有效地为给定的一组字符及其概率(即一组没有任何其他后缀的单词)构建二进制后缀代码。

我的基本想法是使用霍夫曼算法的实现来构造前缀代码。通过反转代码字,我得到了一个无后缀的代码。虽然此解决方案有效,但它似乎不是最佳的,因为我必须反转可变长度的代码字(因此我需要一个结合位移位的查找表)。

有没有办法修改霍夫曼算法以更有效地创建后缀代码?

最佳答案

我会将 HuffmanNode 实现为

class HuffmanNode implements Comparable<HuffmanNode>
{
// data
private String text;
private double frequency;

// linkage
private HuffmanNode left;
private HuffmanNode right;
private HuffmanNode parent;

public HuffmanNode(String text, double frequency)
{
this.text = text;
this.frequency = frequency;
}
public HuffmanNode(HuffmanNode n0, HuffmanNode n1)
{
if(n0.frequency < n1.frequency)
{
left = n0;
right = n1;
}else if(n0.frequency > n1.frequency)
{
left = n1;
right = n0;
}else
{
if(n0.text.compareTo(n1.text) < 0)
{
left = n0;
right = n1;
}else
{
left = n1;
right = n0;
}
}
left.parent = this;
right.parent = this;
text = left.text + right.text;
frequency = left.frequency + right.frequency;
}

public HuffmanNode getParent() {
return parent;
}

public HuffmanNode getLeft() {
return left;
}

public HuffmanNode getRight() {
return right;
}

public String getText()
{
return text;
}

@Override
public int compareTo(HuffmanNode o) {
if(frequency < o.frequency)
return -1;
else if(frequency > o.frequency)
return 1;
else
return text.compareTo(o.text);
}

public Collection<HuffmanNode> leaves()
{
if(left == null && right == null)
{
Set<HuffmanNode> retval = new HashSet<>();
retval.add(this);
return retval;
}
else if(left == null || right == null)
{
Set<HuffmanNode> retval = new HashSet<>();
if(left != null)
retval.addAll(left.leaves());
if(right != null)
retval.addAll(right.leaves());
retval.add(this);
return retval;
}
else
{
Set<HuffmanNode> retval = new HashSet<>();
retval.addAll(left.leaves());
retval.addAll(right.leaves());
return retval;
}
}

public String toString()
{
return "{" + text + " -> " + frequency + "}";
}
}

此类表示霍夫曼树中的单个节点。
它具有从(子)树获取所有叶子的便捷方法。

然后,您可以轻松构建树:
private Map<String,String> buildTree(String text)
{
List<HuffmanNode> nodes = new ArrayList<>();
for(Map.Entry<String,Double> en : frequency(text).entrySet())
{
nodes.add(new HuffmanNode(en.getKey(), en.getValue()));
}
java.util.Collections.sort(nodes);
while(nodes.size() != 1)
{
HuffmanNode n0 = nodes.get(0);
HuffmanNode n1 = nodes.get(1);

// build merged node
HuffmanNode newNode = new HuffmanNode(nodes.get(0), nodes.get(1));
nodes.remove(n0);
nodes.remove(n1);

// calculate insertion point
int insertionPoint = - java.util.Collections.binarySearch(nodes, newNode) - 1;

// insert
nodes.add(insertionPoint, newNode);
}

// build lookup table
Map<String, String> lookupTable = new HashMap<>();
for(HuffmanNode leaf : nodes.iterator().next().leaves())
{
String code = "";
HuffmanNode tmp = leaf;
while(tmp.getParent() != null)
{
if(tmp.getParent().getLeft() == tmp)
code = "0" + code;
else
code = "1" + code;
tmp = tmp.getParent();
}
lookupTable.put(leaf.getText(), code);
}
return lookupTable;
}

通过更改构建代码的方法(例如在下一个数字之前而不是附加它),您可以更改正在生成的代码。

关于compression - 霍夫曼后缀码,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42086279/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com