Why using unix-compress and go compress/lzw produce different files, not readable by the other decoder?

huangapple go评论79阅读模式
英文:

Why using unix-compress and go compress/lzw produce different files, not readable by the other decoder?

问题

我在终端中使用compress file.txt命令压缩了一个文件,并得到了(如预期的)file.txt.Z

当我将该文件传递给Go中的ioutil.ReadFile函数时,

buf0, err := ioutil.ReadFile("file.txt.Z")

我得到了以下错误(上面的代码是第116行):

finder_test.go:116: lzw: invalid code

我发现如果我使用compress/lzw包压缩文件,Go将接受该文件,我只是使用了来自一个网站的代码来实现这一点。我只修改了这一行:

outputFile, err := os.Create("file.txt.lzw")

我将.lzw改为.Z,然后在上面的Go代码中使用生成的file.txt.Z,这样就没有错误了。

注意:file.txt的大小为16.0 kB,Unix压缩的file.txt.Z大小为7.8 kB,Go压缩的file.txt.Z大小为8.2 kB。

现在,我试图理解为什么会发生这种情况。所以,我尝试运行

uncompress.real file.txt.Z

但它没有起作用。我得到了以下错误:

file.txt.Z: not in compressed format

我需要使用一个压缩程序(最好是unix-compress)使用lzw-compression压缩文件,然后在两个不同的算法中使用相同的压缩文件,一个是用C编写的,另一个是用Go编写的,因为我打算比较这两个算法的性能。C程序只接受使用unix-compress压缩的文件,而Go程序只接受使用Go的compress/lzw压缩的文件。

有人能解释一下为什么会发生这种情况吗?为什么这两个.Z文件不等效?我该如何解决这个问题?

注意:我在Mac上安装了VirtualBox中的Ubuntu。

英文:

I compressed a file in a terminal with compress file.txt and got (as expected) file.txt.Z

When I pass that file to ioutil.ReadFile in Go,

buf0, err := ioutil.ReadFile("file.txt.Z")

I get the error (the line above is 116):

finder_test.go:116: lzw: invalid code

I found that Go would accept the file if I compress it using the compress/lzw package, I just used code from <a href="https://www.socketloop.com/references/golang-compress-lzw-newwriter-function-example">a website</a> that does that. I only modified the line

outputFile, err := os.Create(&quot;file.txt.lzw&quot;)

I changed the .lzw to .Z. then used the resulting file.txt.Z in the Go code at the top, and it worked fine, no error.

Note: file.txt is 16.0 kB, unix-compressed file.txt.Z is 7.8 kB, and go-compressed file.txt.Z is 8.2 kB

Now, I was trying to understand why this happened. So, I tried to run

uncompress.real file.txt.Z

and it did not work. I got

file.txt.Z: not in compressed format

I need to use a compressor (preferably unix-compress) to compress files using lzw-compression then use the same compressed files on two different algorithms, one written in C and the other in Go, because I intend to compare the performance of the two algorithms. The C program will only accept the files compressed with unix-compress and the Go program will only accept the files compressed with Go's compress/lzw.

Can someone explain why that happened? Why are the two .Z files not equivalent? How can I overcome this?

Note: I am working on Ubuntu installed in VirtualBox on a Mac.

答案1

得分: 2

一个.Z文件不仅包含LZW压缩数据,还包含一个3字节的头部。Go LZW代码不会生成这个头部,因为它的目的是压缩数据,而不是生成一个Z文件。

英文:

A .Z file does not only contain LZW compressed data, there is also a 3-bytes header that the Go LZW code does not generate because it is meant to compress data, not generate a Z file.

答案2

得分: 1

你可能只想测试你的两个算法或第三方算法的性能(而不是压缩算法本身),你可以编写一个调用压缩命令并传递所需文件/目录的shell脚本,然后从你的C / GO程序中调用这个脚本。这是一种你可以解决这个问题的方法,但这还需要解决如何正确使用压缩库的其他部分的问题。

英文:

Presumably you only want to test the performance of two of your/some third party algorithms (& not the compression algorithms themselves), you may want to write a shell script which calls the compress command passing the files/dir's required and then call this script from your C / GO program. This is one way you can overcome this, but leaves open other parts of your queries on the correct way to use the compression libraries.

答案3

得分: 0

这个问题背后有一个古老的错误,名为"对齐位组"。我在维基百科的"特殊输出格式"中对其进行了描述,请阅读。

我实现了一个新的库lzws。它具有所有可能的选项:

  1. --without-magic-header (-w) - 禁用魔术头部
  2. --max-code-bit-length (-b) - 设置最大代码位长度(9-16)
  3. --raw (-r) - 禁用块模式
  4. --msb (-m) - 启用最高有效位
  5. --unaligned-bit-groups (-u) - 启用非对齐位组

您可以以所有可能的组合使用任何选项。已经测试了所有组合。我相信您可以找到适合Go LZW实现的组合。

如果您喜欢使用Ruby,您可以使用ruby-lzws绑定。

英文:

There is an ancient bug named "alignment bit groups" behind this question. I've described it in wikipedia "Special output format". Please read.

I've implemented a new library lzws. It has all possible options:

  1. --without-magic-header (-w) - disable magic header
  2. --max-code-bit-length (-b) - set max code bit length (9-16)
  3. --raw (-r) - disable block mode
  4. --msb (-m) - enable most significant bit
  5. --unaligned-bit-groups (-u) - enable unaligned bit groups

You can use any options in all possible combinations. All combinations has been tested. I am sure that you can find combinations suitable for go lzw implementation.

You can use ruby-lzws binding if you like to use ruby.

huangapple
  • 本文由 发表于 2017年3月20日 01:22:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/42889664.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定