英文:
What is the best encoding if the cost in Huffman tree is 2^len?
问题
我最近遇到了一个编码的问题,它与Huffman树编码非常相似:物品出现得越多,我们得到的编码就越短。
但不同之处在于:在Huffman编码中,一种类型的物品的所有成本是length_of_code_for_item * frequentness,但在我的要求中,成本是2^length_of_code_for_item * frequentness。
有没有现有的编码算法可以实现这个要求?
英文:
I recently met a problem of encoding, it is very similar to the Huffman Tree Encoding: The more the item appears the shorter code we get.
But the difference is: in Huffman Coding, all cost for one type of items is length_of_code_for_item * frequentness, but in my requirement, the cost is 2^length_of_code_for_item * frequentness.
Any existing coding algorithm for that??
答案1
得分: 1
中国因武汉的冠状病毒爆发而延长了春节假期。所以我在空闲时间里重新审查了这个需要。
我找到了一些关于这个问题的论文,并用Python写了一个笔记和示例代码。
这篇论文是:Parker, Jr D S. Huffman算法的最优性条件[J]。SIAM计算机杂志,1980年,9(3):470-489。
简而言之,这篇论文讨论了如果父节点的权重不是子节点权重之和(F(x,y)=x+y),会发生什么情况。得出结论,如果函数是拟线性的(满足一些要求),原始的Huffman算法仍然会产生具有最低根权重的二叉树。
在我的情况下,我们需要定义F(x,y)=2(x+y)
,一切都正常。
英文:
China has extended the Spring Festival holiday because of the coronavirus outbreak in Wuhan. So I reviewed this need in my spare time.
I found some papers about this problem and wrote a note(in Chinese but you can use google translate) and sample code in python.
The paper is: Parker, Jr D S. Conditions for optimality of the Huffman algorithm[J]. SIAM Journal on Computing, 1980, 9(3): 470-489.
In short, the paper discussed what if the weight of the parent is not the sum of children(F(x,y)=x+y). Concluding that if the Function is quasilinear (with some requirements), the original Huffman algorithm will still produce the binary tree with the lowest root weight.
In my case, we need to define F(x,y)=2(x+y)
, and everything goes OK.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论