How do I decode zlib stream in Go?

huangapple go评论81阅读模式
英文:

How do I decode zlib stream in Go?

问题

问题是什么?
我无法使用Go的zlib包解码有效的压缩块。

我准备了一个包含代码和数据的GitHub存储库,以说明我遇到的问题:https://github.com/andreyst/zlib-issue。

这些块是什么?
它们是由一个文本游戏服务器(MUD)生成的消息。该游戏服务器以多个块发送压缩的消息流,其中第一个块包含zlib头,其他块则不包含。

我使用一个名为"mcclient"的代理捕获了两个块(第一个和第二个),它是一个附加程序,用于为不支持压缩的MUD客户端提供压缩功能。它是用C编写的,并使用C的zlib库来解码压缩块。

这些块包含在"chunks"目录中,编号为01*.in文件包含压缩数据。*.out包含从mcclient捕获的未压缩数据。*.log包含zlib解压缩的状态(inflate调用的返回代码)。

一个特殊的all.in块是将块0和块1连接在一起的结果。

为什么我认为它们是有效的?

  1. mcclient使用C的zlib成功解压缩输入块,没有任何问题。*.log状态显示为0,这意味着在zlib术语中没有错误。
  2. 在Linux下,zlib-flate -uncompress < chunks/all.in可以正常工作,没有任何错误,并且解压缩为相同的内容。在Mac OS下,它也可以解压缩为相同的内容,但是会显示警告zlib-flate: WARNING: zlib code -5, msg = input stream is complete but output may still be valid,这看起来是预期的,因为块不包含"official"流结束标志。
  3. decompress.py中的Python代码可以正确地解压缩all.in0/1块,没有任何问题。

Go的zlib存在什么问题?
请参考main.go,它尝试解压缩这些块,首先是all.in,然后逐步尝试解压缩块01

尝试解码all.infunc all())在某种程度上成功,至少解压缩的数据是相同的,但是zlib读取器返回错误flate: corrupt input before offset 446

当尝试逐个解压缩块的实际场景(func stream())时,zlib读取器解码第一个块时返回了预期的数据,但是返回错误flate: corrupt input before offset 32,并且后续尝试解码块1完全失败。

问题是什么?
是否可能在Go的zlib包中使用某种适用于这种情况的"流式"模式?也许我使用的方式不正确?

如果不行,有什么解决方法?此外,了解为什么会这样以及是否是设计如此的原因将会很有趣。是否只是尚未实现?我漏掉了什么?

英文:

What is the issue?

I cannot decode valid compressed chunks from zlib stream using go's zlib package.

I have prepared a github repo which contains code and data illustrating the issue I have: https://github.com/andreyst/zlib-issue.

What are those chunks?

They are messages generated by a text game server (MUD). This game server send compressed stream of messages in multiple chunks, first of which contains zlib header and others do not.

I have captured two chunks (first and second) with a proxy called "mcclient", which is a sidecar to provide compression for MUD clients that do not support compression. It is written in C and uses C zlib library to decode compressed chunks.

Chunks are contained in "chunks" directory and are numerated 0 and 1. *.in files contain compressed data. *.out contain uncompressed data captured from mcclient. *.log contain status of zlib decompression (return code of inflate call).

A special all.in chunk is chunk 0 concatenated with chunk 1.

Why do I think they are valid?

  1. mcclient successfully decompresses input chunks with C's zlib without any issues. *.log status shows 0 which means Z_OK which means no errors in zlib parlance.
  2. zlib-flate -uncompress &lt; chunks/all.in works without any errors under Linux and decompresses to same content. Under Mac OS it also decompresses to same content, but with warning zlib-flate: WARNING: zlib code -5, msg = input stream is complete but output may still be valid — which look as expected because chunks do not contain "official" stream end.
  3. Python code in decompress.py correctly decompresses with both all.in and 0/1 chunks without any issues.

What is the issue with go's zlib?

See main.go — it tries to decompress those chunks, starting with all.in and then trying to decompress chunks 0 and 1 step by step.

An attempt to decode all.in (func all()) somewhat succeeds, at least decompressed data is the same, but zlib reader returns error flate: corrupt input before offset 446.

When trying real-life scenario of decompressing chunk by chunk (func stream()), zlib reader decodes first chunk with expected data, but returning an error flate: corrupt input before offset 32, and subsequent attempt to decode chunk 1 fails completely.

The question

Is it possible to use go's zlib package in some kind of "streaming" mode which is suited for scenario like this? Maybe I am using it incorrectly?

If not, what is the workaround? Also it would be interesting to know, why is that so — is it by design? Is it just not implemented yet? What am I missing?

答案1

得分: 2

注意到错误提示说你输入后的偏移处的数据是损坏的。这是因为你从文件中读取的方式导致的:

    buf := make([]byte, 100000)
	n, readErr := f.Read(buf)
	if readErr != nil {
		log.Fatalf("readErr=%v\n", readErr)
	}
	fmt.Printf("Read bytes, n=%v\n", n)

	buffer := bytes.NewBuffer(buf)
	zlibReader, zlibErr := zlib.NewReader(buffer)
	if zlibErr != nil {
		log.Fatalf("zlibErr=%v\n", zlibErr)
	}

buf := make([]byte, 100000)会创建一个包含100000个字节的切片,其中所有字节的值都是0。但是在all.in的情况下,你只读取了443个字节。由于你从未缩短切片的长度,读取器会在有效数据之后遇到几千个零,并认为数据是损坏的。这就是为什么你会得到输出和错误的原因。

至于流式传输,在TCP/UDP连接的情况下,你应该可以直接将连接(即io.Reader)传递给zlib.NewReader。为了模拟相同的情况,我在修改后的代码中使用了io.Pipe

package main

import (
	"bytes"
	"compress/zlib"
	"fmt"
	"io"
	"log"
	"os"

	otherzlib "github.com/4kills/go-zlib"
)

func main() {
	all()
	stream()

	// Alas it hangs :(
	// otherZlib()
}

func all() {
	fmt.Println("==== RUNNING DECOMPRESSION OF all.in")
	fmt.Println("")

	buf, readErr := os.ReadFile("./chunks/all.in")
	if readErr != nil {
		log.Fatalf("readErr=%v\n", readErr)
	}
	fmt.Printf("Read bytes, n=%v\n", len(buf))

	buffer := bytes.NewBuffer(buf)
	zlibReader, zlibErr := zlib.NewReader(buffer)
	if zlibErr != nil {
		log.Fatalf("zlibErr=%v\n", zlibErr)
	}

	out := new(bytes.Buffer)
	written, copyErr := io.Copy(out, zlibReader)
	if copyErr != nil {
		log.Printf("copyErr=%v\n", copyErr)
	}
	fmt.Printf("Written bytes, n=%v, out:\n%v\n", written, out.String())
	fmt.Println("")
}

func stream() {
	fmt.Println("==== RUNNING DECOMPRESSION OF SEPARATE CHUNKS")
	fmt.Println("")

	pRead, pWrite := io.Pipe()
	go func() {
		buf, readErr := os.ReadFile("./chunks/0.in")
		if readErr != nil {
			log.Fatalf("readErr=%v\n", readErr)
		}
		fmt.Printf("Read 0 bytes, n=%v\n", len(buf))

		written0, copy0Err := io.Copy(pWrite, bytes.NewBuffer(buf))
		if copy0Err != nil {
			log.Printf("copy0Err=%v\n", copy0Err)
		}
		fmt.Printf("Written compressed bytes, n0=%v", written0)

		buf, readErr = os.ReadFile("./chunks/1.in")
		if readErr != nil {
			log.Fatalf("read1Err=%v\n", readErr)
		}
		fmt.Printf("Read 1 bytes, n=%v\n", len(buf))

		written1, copy1Err := io.Copy(pWrite, bytes.NewBuffer(buf))
		if copy1Err != nil {
			log.Printf("copy1Err=%v\n", copy1Err)
		}
		fmt.Printf("Written compressed bytes, n1=%v", written1)

		pWrite.Close()
	}()

	zlibReader, zlibErr := zlib.NewReader(pRead)
	if zlibErr != nil {
		log.Fatalf("zlibErr=%v\n", zlibErr)
	}

	out := new(bytes.Buffer)
	written2, copy2Err := io.Copy(out, zlibReader)
	if copy2Err != nil {
		log.Printf("copy2Err=%v\n", copy2Err)
	}
	fmt.Printf("Written decompressed bytes, n0=%v, out:\n%v\n", written2, out.String())

	fmt.Println("")
}

使用这段代码,我从stream()中没有得到任何错误,但是从all()中仍然得到了copyErr=unexpected EOF错误,看起来all.in在末尾缺少校验数据,但我认为这只是一个意外。

英文:

Notice that error is saying that the data at an offset after your input is corrupt. That is because of the way your are reading from the files:

    buf := make([]byte, 100000)
	n, readErr := f.Read(buf)
	if readErr != nil {
		log.Fatalf(&quot;readErr=%v\n&quot;, readErr)
	}
	fmt.Printf(&quot;Read bytes, n=%v\n&quot;, n)

	buffer := bytes.NewBuffer(buf)
	zlibReader, zlibErr := zlib.NewReader(buffer)
	if zlibErr != nil {
		log.Fatalf(&quot;zlibErr=%v\n&quot;, zlibErr)
	}

buf := make([]byte, 100000) will make a slice of 100000 bytes, all of them 0. But you are only reading 443 bytes in the case of all.in. Since you never shorten the slice, the reader will encounter a few thousand zeros after the valid data and conclude it is corrupt. That is why you get output and an error.

As for streaming. In the case of a TCP/UDP connection you should be able to just pass the connection which is a io.Reader to the zlib.NewReader. To simulate the same I used an io.Pipe in the modified code:

package main

import (
	&quot;bytes&quot;
	&quot;compress/zlib&quot;
	&quot;fmt&quot;
	&quot;io&quot;
	&quot;log&quot;
	&quot;os&quot;

	otherzlib &quot;github.com/4kills/go-zlib&quot;
)

func main() {
	all()
	stream()

	// Alas it hangs :(
	// otherZlib()
}

func all() {
	fmt.Println(&quot;==== RUNNING DECOMPRESSION OF all.in&quot;)
	fmt.Println(&quot;&quot;)

	buf, readErr := os.ReadFile(&quot;./chunks/all.in&quot;)
	if readErr != nil {
		log.Fatalf(&quot;readErr=%v\n&quot;, readErr)
	}
	fmt.Printf(&quot;Read bytes, n=%v\n&quot;, len(buf))

	buffer := bytes.NewBuffer(buf)
	zlibReader, zlibErr := zlib.NewReader(buffer)
	if zlibErr != nil {
		log.Fatalf(&quot;zlibErr=%v\n&quot;, zlibErr)
	}

	out := new(bytes.Buffer)
	written, copyErr := io.Copy(out, zlibReader)
	if copyErr != nil {
		log.Printf(&quot;copyErr=%v\n&quot;, copyErr)
	}
	fmt.Printf(&quot;Written bytes, n=%v, out:\n%v\n&quot;, written, out.String())
	fmt.Println(&quot;&quot;)
}

func stream() {
	fmt.Println(&quot;==== RUNNING DECOMPRESSION OF SEPARATE CHUNKS&quot;)
	fmt.Println(&quot;&quot;)

	pRead, pWrite := io.Pipe()
	go func() {
		buf, readErr := os.ReadFile(&quot;./chunks/0.in&quot;)
		if readErr != nil {
			log.Fatalf(&quot;readErr=%v\n&quot;, readErr)
		}
		fmt.Printf(&quot;Read 0 bytes, n=%v\n&quot;, len(buf))

		written0, copy0Err := io.Copy(pWrite, bytes.NewBuffer(buf))
		if copy0Err != nil {
			log.Printf(&quot;copy0Err=%v\n&quot;, copy0Err)
		}
		fmt.Printf(&quot;Written compressed bytes, n0=%v&quot;, written0)

		buf, readErr = os.ReadFile(&quot;./chunks/1.in&quot;)
		if readErr != nil {
			log.Fatalf(&quot;read1Err=%v\n&quot;, readErr)
		}
		fmt.Printf(&quot;Read 1 bytes, n=%v\n&quot;, len(buf))

		written1, copy1Err := io.Copy(pWrite, bytes.NewBuffer(buf))
		if copy1Err != nil {
			log.Printf(&quot;copy1Err=%v\n&quot;, copy1Err)
		}
		fmt.Printf(&quot;Written compressed bytes, n1=%v&quot;, written1)

		pWrite.Close()
	}()

	zlibReader, zlibErr := zlib.NewReader(pRead)
	if zlibErr != nil {
		log.Fatalf(&quot;zlibErr=%v\n&quot;, zlibErr)
	}

	out := new(bytes.Buffer)
	written2, copy2Err := io.Copy(out, zlibReader)
	if copy2Err != nil {
		log.Printf(&quot;copy2Err=%v\n&quot;, copy2Err)
	}
	fmt.Printf(&quot;Written decompressed bytes, n0=%v, out:\n%v\n&quot;, written2, out.String())

	fmt.Println(&quot;&quot;)
}

With this code I get no errors from stream() but I still get a copyErr=unexpected EOF error from all(), looks like the all.in is missing checksum data at the end, but I figure that is just an accident.

答案2

得分: 0

通过仔细调试,我发现我错误地传递了过大的缓冲区切片,导致了错误的输入缓冲区被传递给解压缩。

此外,重要的是不要使用io.Copy,因为它会导致缓冲区上的EOF,从而停止一切操作,而是使用zlibReader.Read(),它将解压缩当前缓冲区中的所有内容。

我已经更新了代码,现在它按预期工作。

英文:

With careful debugging I was able to see that I have incorrectly passed too large buffer slices which lead to incorrect input buffers being fed to decompression.

Also, it is important not to use io.Copy, which leads to EOF on buffer which stops everything, and instead to use just zlibReader.Read(), which will decompress everything that is currently in buffer now.

I've updated the code so it now works as expected.

huangapple
  • 本文由 发表于 2021年12月31日 05:42:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/70536980.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定