Golang: What does the zlib.NewWriterLevelDict / zlib.NewReaderDict do?

huangapple go评论86阅读模式
英文:

Golang: What does the zlib.NewWriterLevelDict / zlib.NewReaderDict do?

问题

使用zlib时,我不清楚使用dict的作用是什么?有人知道它的目的或者它是如何工作的吗?我在谷歌和YouTube上搜索了很久,但几乎没有找到有关它的信息。我猜想它可能是用来过滤输入和输出的,但似乎并不是这样。看起来它是将其作为某种压缩和解压缩的密钥来使用。这样理解对吗?感谢任何帮助。

英文:

Its not clear to me what using the dict does when using it with zlib? Does anyone know what its purpose it or how it works? I've been searching google and youtube with very little luck in learning what its doing. My assumption was that it was filtering the inputs and outputs but that doesn't seem to be it. It looks like it's using it as some kind of key for compression and decompression. Is that correct? Any help is appreciated.

答案1

得分: 3

在未压缩数据的每个点上,zlib使用前32K个未压缩数据来搜索与当前位置的数据匹配的字节序列。压缩的很大一部分来自于对匹配序列的距离和长度进行编码,而不是字节本身。

当zlib从未压缩数据的开头开始时,没有前32K个数据!对于前32K个数据,zlib处于一种劣势,没有完整的32K个历史数据。

提供一个字典可以让zlib提前开始,给它一个“之前”的32K数据,这样它就不需要对其进行压缩。你可以尝试用在你要压缩的数据中可能出现的字节序列来填充字典。

你与zlib达成的协议是,在解压缩端提供完全相同的32K字典,这样zlib就不需要将其包含在压缩数据中。然而,zlib会在头部编码一个字典的校验值,以便你可以在一定程度上验证你在另一端是否有正确的字典,甚至可以使用该校验值来在可能使用的几个字典中进行选择。

如果你要压缩大量输入,这个对前32K的提前开始并不会有太大的差别。然而,如果你试图压缩短字节序列,并且知道这些短序列中会出现什么,那么字典可以产生巨大的差异。

英文:

At every point in the uncompressed data, zlib uses the previous 32K of uncompressed data in which to search for a sequence of bytes that matches the data at the current position. Much of the compression comes from coding the distance back and the length of the matching sequence, instead of the bytes themselves.

When zlib starts at the beginning of the uncompressed data, there is no previous 32K! And for the first 32K, zlib is operating at somewhat of a disadvantage, without a full 32K of history.

Providing a dictionary gives zlib a head start by giving it a "previous" 32K of data that it doesn't have to compress. You would try to populate that dictionary with sequences of bytes that you might expect to see in the data that you're compressing.

The bargain you make with zlib is that you will provide that exact same 32K of dictionary on the decompression end, so that zlib doesn't have to include it in the compressed data. zlib will however encode a check value of that dictionary in the header, so that you can verify (to some extent) that you have the right dictionary at the other end, and perhaps even to use that check value to select among several dictionaries that may be used.

If you're compressing large input, this head start on the first 32K really won't make much difference. However if you're trying to compress short sequences of bytes, and you know what to expect in those short sequences, then a dictionary can make a huge difference.

huangapple
  • 本文由 发表于 2022年6月14日 05:46:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/72609397.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定