问题

使用zlib时，我不清楚使用dict的作用是什么？有人知道它的目的或者它是如何工作的吗？我在谷歌和YouTube上搜索了很久，但几乎没有找到有关它的信息。我猜想它可能是用来过滤输入和输出的，但似乎并不是这样。看起来它是将其作为某种压缩和解压缩的密钥来使用。这样理解对吗？感谢任何帮助。

英文:

Its not clear to me what using the dict does when using it with zlib? Does anyone know what its purpose it or how it works? I've been searching google and youtube with very little luck in learning what its doing. My assumption was that it was filtering the inputs and outputs but that doesn't seem to be it. It looks like it's using it as some kind of key for compression and decompression. Is that correct? Any help is appreciated.

答案1

得分: 3

在未压缩数据的每个点上，zlib使用前32K个未压缩数据来搜索与当前位置的数据匹配的字节序列。压缩的很大一部分来自于对匹配序列的距离和长度进行编码，而不是字节本身。

当zlib从未压缩数据的开头开始时，没有前32K个数据！对于前32K个数据，zlib处于一种劣势，没有完整的32K个历史数据。

提供一个字典可以让zlib提前开始，给它一个“之前”的32K数据，这样它就不需要对其进行压缩。你可以尝试用在你要压缩的数据中可能出现的字节序列来填充字典。

你与zlib达成的协议是，在解压缩端提供完全相同的32K字典，这样zlib就不需要将其包含在压缩数据中。然而，zlib会在头部编码一个字典的校验值，以便你可以在一定程度上验证你在另一端是否有正确的字典，甚至可以使用该校验值来在可能使用的几个字典中进行选择。

如果你要压缩大量输入，这个对前32K的提前开始并不会有太大的差别。然而，如果你试图压缩短字节序列，并且知道这些短序列中会出现什么，那么字典可以产生巨大的差异。

英文:

At every point in the uncompressed data, zlib uses the previous 32K of uncompressed data in which to search for a sequence of bytes that matches the data at the current position. Much of the compression comes from coding the distance back and the length of the matching sequence, instead of the bytes themselves.

When zlib starts at the beginning of the uncompressed data, there is no previous 32K! And for the first 32K, zlib is operating at somewhat of a disadvantage, without a full 32K of history.

Providing a dictionary gives zlib a head start by giving it a "previous" 32K of data that it doesn't have to compress. You would try to populate that dictionary with sequences of bytes that you might expect to see in the data that you're compressing.

The bargain you make with zlib is that you will provide that exact same 32K of dictionary on the decompression end, so that zlib doesn't have to include it in the compressed data. zlib will however encode a check value of that dictionary in the header, so that you can verify (to some extent) that you have the right dictionary at the other end, and perhaps even to use that check value to select among several dictionaries that may be used.

If you're compressing large input, this head start on the first 32K really won't make much difference. However if you're trying to compress short sequences of bytes, and you know what to expect in those short sequences, then a dictionary can make a huge difference.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Golang: What does the zlib.NewWriterLevelDict / zlib.NewReaderDict do?

问题

答案1

Mac nginx: [emerg] bind() to 0.0.0.0:8000 failed (48: Address already in use)

Golang中允许使用元数据或属性吗？

使用goroutine与缓冲读取相结合，以优化读取大文件的操作。

从goroutine中没有得到预期的输出。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论