英文:
Deflate algorithm different result with different software
问题
我正在阅读关于deflate算法的内容,作为学习的一部分,我选择了一个文件,使用不同的方法对其进行了压缩。我发现让我非常困惑的是,不同的方法产生了不同的字节来表示压缩文件。
我尝试使用WinRar、7-Zip、Java的zlib库(ZipOutputStream
类)以及手动对源数据进行deflate压缩(Deflater
类)。这四种方法都产生了完全不同的字节。
我的目标只是想看到所有方法产生相同的字节数组作为结果,但事实并非如此,我想知道可能是什么原因?通过检查文件头,我确保所有这些软件实际上都使用了deflate算法。
有人能帮忙解决这个问题吗?难道deflate算法对于完全相同的源文件可能会产生不同的压缩结果吗?
英文:
I am currently reading about the deflate algorithm and as part of learning I picked one file that I zipped using different methods. What I found and what confuses me very much is that the different methods produced different bytes representing the compressed file.
I tried zipping the file using WinRar, 7-Zip, using the Java zlib library(ZipOutputStream
class) and also manually by just doing the deflate upon the source data(Deflater
class). All of the four methods produced completely different bytes.
My goal was just to see that all of the methods produced the same byte array as a result, but this was not the case and my question is why could that be? I made sure by checking the file headers that all of this software actually used the deflate algorithm.
Can anyone help with this? Is it possible that deflate algorithm can produce different compressed result for exactly the same source file?
答案1
得分: 0
原因是Deflate是一种格式,而不是一个算法。压缩过程分为两步:LZ77(在这一步中,您可以从几乎无限的可能算法中进行选择)。然后,LZ77消息使用Huffman树进行编码(同样,在如何定义这些树方面有大量选择)。此外,在LZ77消息流中,不时地重新定义树并开始一个新的块或不重新定义。在这里,关于如何分割这些块也有大量的选择。
英文:
The reason is that Deflate is a format, not an algorithm. The compression happens in two steps: LZ77 (here you have a large choice of algorithms among a quasi infinity of possible algorithms). Then, the LZ77 messages are encoded with Huffman trees (again a very large amount of choices about how to define those trees). Additionally, from time to time in the stream of LZ77 messages, it is good to redefine the trees and start a new block - or not. Here there is again an enormous amount of choices about how to split those blocks.
答案2
得分: 0
有许多许多针对相同数据的deflate表示。肯定您已经注意到可以设置压缩级别。只有在存在不同的压缩相同数据的方式时,才会产生影响。您得到的内容取决于压缩级别、任何其他压缩设置、您使用的软件以及该软件的版本。
唯一的保证是,当您进行压缩然后解压缩时,您获得的内容与您开始的内容完全相同。没有任何保证,也不需要或者不应该有这样的保证,在解压缩然后再进行压缩时您会得到相同的内容。
您为什么有那个目标?
英文:
There are many, many deflate representations of the same data. Surely you have already noticed that you can set a compression level. That could only have an effect if there were different ways to compress the same data. What you get depends on the compression level, any other compression settings, the software you are using, and the version of that software.
The only guarantee is that when you compress and then decompress, you get exactly what you started with. There is no guarantee, nor does there need to be or should be such a guarantee, that you get the same thing when you decompress and then compress.
Why do you have that goal?
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论