gpt4 book ai didi

python - 在霍夫曼算法中编码中间叶子

转载 作者:塔克拉玛干 更新时间:2023-11-03 03:51:23 24 4
gpt4 key购买 nike

我正在使用霍夫曼编码对一组项目进行编码,除了最终编码之外,我还想返回编码后的中间节点,但是子节点的数据连接到中间节点。

例如,如果我要对这组符号和概率进行编码:

tree = [('a',0.25),('b',0,25),('c',0.25),('d',0.125),('e',0.125)]

我希望返回以下内容:

tree = [['ab','0'],['cde','1'],['a','00'],['b','01'],['c','10'],['de','11'],['d','110'],['e','111']]

我正在使用以下代码生成哈夫曼树:

import heapq

#symbfreq = list of symbols and associated frequencies
def encode(symbfreq):
#create a nested list as a tree
tree = [[wt, [sym, ""]] for sym, wt in symbfreq]
#turn the tree into a heap
heapq.heapify(tree)
while len(tree)>1:
#pop the lowest two nodes off the heap, sorted on the length
lo, hi = sorted([heapq.heappop(tree), heapq.heappop(tree)], key=len)
#add the next bit of the codeword
for pair in lo[1:]:
pair[1] = '0' + pair[1]
for pair in hi[1:]:
pair[1] = '1' + pair[1]
#push a new node, which is the sum of the two lowest probability nodes onto the heap
heapq.heappush(tree, [lo[0]+hi[0]] + lo[1:] + hi[1:])
return sorted(heapq.heappop(tree)[1:], key=lambda p: (len(p[-1]), p))

霍夫曼算法是:

1.Create a leaf node for each symbol and add it to the priority queue.
2.While there is more than one node in the queue:
3.Remove the node of highest priority (lowest probability) twice to get two nodes.
4.Create a new internal node with these two nodes as children and with probability equal to the sum of the two nodes' probabilities.
5.Add the new node to the queue.
6.The remaining node is the root node and the tree is complete.

我一辈子都想不出一种方法来阻止中间节点被覆盖(即我想保留在第 4 阶段创建的中间节点)。

最佳答案

我不知道在构建时如何做(构建永远不会弹出的输出树的一部分),但您可以很容易地检索中间节点:

huffman_tree = encode(tree)
complete_tree = huffman_tree

get_intermediate_node = lambda val, arr : ''.join( [ char for char,binary in itertools.ifilter( lambda node : node[1].startswith( val ),arr)] )

for val in range( next_power_of_two( len(huffman_tree) ) ):
bvalue = bin(val)[2:]
node = [ get_intermediate_node( bvalue , huffman_tree) , bvalue ]
if node not in complete_tree:
complete_tree.append( node)

print sorted( complete_tree , key=lambda p: (len(p[-1]), p) )

>>> [['ab', '0'], ['cde', '1'], ['a', '00'], ['b', '01'], ['c', '10'],
['de', '11'], ['', '100'], ['', '101'], ['d', '110'], ['e', '111']]

(你仍然需要修剪空节点)

关于python - 在霍夫曼算法中编码中间叶子,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19766697/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com