英文:
How do I decode zlib stream in Go?
问题
问题是什么?
我无法使用Go的zlib
包解码有效的压缩块。
我准备了一个包含代码和数据的GitHub存储库,以说明我遇到的问题:https://github.com/andreyst/zlib-issue。
这些块是什么?
它们是由一个文本游戏服务器(MUD)生成的消息。该游戏服务器以多个块发送压缩的消息流,其中第一个块包含zlib头,其他块则不包含。
我使用一个名为"mcclient"的代理捕获了两个块(第一个和第二个),它是一个附加程序,用于为不支持压缩的MUD客户端提供压缩功能。它是用C编写的,并使用C的zlib
库来解码压缩块。
这些块包含在"chunks"目录中,编号为0
和1
。*.in
文件包含压缩数据。*.out
包含从mcclient捕获的未压缩数据。*.log
包含zlib解压缩的状态(inflate
调用的返回代码)。
一个特殊的all.in
块是将块0
和块1
连接在一起的结果。
为什么我认为它们是有效的?
mcclient
使用C的zlib
成功解压缩输入块,没有任何问题。*.log
状态显示为0
,这意味着在zlib术语中没有错误。- 在Linux下,
zlib-flate -uncompress < chunks/all.in
可以正常工作,没有任何错误,并且解压缩为相同的内容。在Mac OS下,它也可以解压缩为相同的内容,但是会显示警告zlib-flate: WARNING: zlib code -5, msg = input stream is complete but output may still be valid
,这看起来是预期的,因为块不包含"official"流结束标志。 decompress.py
中的Python代码可以正确地解压缩all.in
和0
/1
块,没有任何问题。
Go的zlib存在什么问题?
请参考main.go
,它尝试解压缩这些块,首先是all.in
,然后逐步尝试解压缩块0
和1
。
尝试解码all.in
(func all()
)在某种程度上成功,至少解压缩的数据是相同的,但是zlib读取器返回错误flate: corrupt input before offset 446
。
当尝试逐个解压缩块的实际场景(func stream()
)时,zlib读取器解码第一个块时返回了预期的数据,但是返回错误flate: corrupt input before offset 32
,并且后续尝试解码块1
完全失败。
问题是什么?
是否可能在Go的zlib
包中使用某种适用于这种情况的"流式"模式?也许我使用的方式不正确?
如果不行,有什么解决方法?此外,了解为什么会这样以及是否是设计如此的原因将会很有趣。是否只是尚未实现?我漏掉了什么?
英文:
What is the issue?
I cannot decode valid compressed chunks from zlib stream using go's zlib
package.
I have prepared a github repo which contains code and data illustrating the issue I have: https://github.com/andreyst/zlib-issue.
What are those chunks?
They are messages generated by a text game server (MUD). This game server send compressed stream of messages in multiple chunks, first of which contains zlib header and others do not.
I have captured two chunks (first and second) with a proxy called "mcclient", which is a sidecar to provide compression for MUD clients that do not support compression. It is written in C and uses C zlib
library to decode compressed chunks.
Chunks are contained in "chunks" directory and are numerated 0
and 1
. *.in
files contain compressed data. *.out
contain uncompressed data captured from mcclient. *.log
contain status of zlib decompression (return code of inflate
call).
A special all.in
chunk is chunk 0
concatenated with chunk 1
.
Why do I think they are valid?
mcclient
successfully decompresses input chunks with C'szlib
without any issues.*.log
status shows0
which means Z_OK which means no errors in zlib parlance.zlib-flate -uncompress < chunks/all.in
works without any errors under Linux and decompresses to same content. Under Mac OS it also decompresses to same content, but with warningzlib-flate: WARNING: zlib code -5, msg = input stream is complete but output may still be valid
— which look as expected because chunks do not contain "official" stream end.- Python code in
decompress.py
correctly decompresses with bothall.in
and0
/1
chunks without any issues.
What is the issue with go's zlib?
See main.go
— it tries to decompress those chunks, starting with all.in
and then trying to decompress chunks 0
and 1
step by step.
An attempt to decode all.in
(func all()
) somewhat succeeds, at least decompressed data is the same, but zlib reader returns error flate: corrupt input before offset 446
.
When trying real-life scenario of decompressing chunk by chunk (func stream()
), zlib reader decodes first chunk with expected data, but returning an error flate: corrupt input before offset 32
, and subsequent attempt to decode chunk 1
fails completely.
The question
Is it possible to use go's zlib
package in some kind of "streaming" mode which is suited for scenario like this? Maybe I am using it incorrectly?
If not, what is the workaround? Also it would be interesting to know, why is that so — is it by design? Is it just not implemented yet? What am I missing?
答案1
得分: 2
注意到错误提示说你输入后的偏移处的数据是损坏的。这是因为你从文件中读取的方式导致的:
buf := make([]byte, 100000)
n, readErr := f.Read(buf)
if readErr != nil {
log.Fatalf("readErr=%v\n", readErr)
}
fmt.Printf("Read bytes, n=%v\n", n)
buffer := bytes.NewBuffer(buf)
zlibReader, zlibErr := zlib.NewReader(buffer)
if zlibErr != nil {
log.Fatalf("zlibErr=%v\n", zlibErr)
}
buf := make([]byte, 100000)
会创建一个包含100000个字节的切片,其中所有字节的值都是0。但是在all.in
的情况下,你只读取了443个字节。由于你从未缩短切片的长度,读取器会在有效数据之后遇到几千个零,并认为数据是损坏的。这就是为什么你会得到输出和错误的原因。
至于流式传输,在TCP/UDP连接的情况下,你应该可以直接将连接(即io.Reader
)传递给zlib.NewReader
。为了模拟相同的情况,我在修改后的代码中使用了io.Pipe:
package main
import (
"bytes"
"compress/zlib"
"fmt"
"io"
"log"
"os"
otherzlib "github.com/4kills/go-zlib"
)
func main() {
all()
stream()
// Alas it hangs :(
// otherZlib()
}
func all() {
fmt.Println("==== RUNNING DECOMPRESSION OF all.in")
fmt.Println("")
buf, readErr := os.ReadFile("./chunks/all.in")
if readErr != nil {
log.Fatalf("readErr=%v\n", readErr)
}
fmt.Printf("Read bytes, n=%v\n", len(buf))
buffer := bytes.NewBuffer(buf)
zlibReader, zlibErr := zlib.NewReader(buffer)
if zlibErr != nil {
log.Fatalf("zlibErr=%v\n", zlibErr)
}
out := new(bytes.Buffer)
written, copyErr := io.Copy(out, zlibReader)
if copyErr != nil {
log.Printf("copyErr=%v\n", copyErr)
}
fmt.Printf("Written bytes, n=%v, out:\n%v\n", written, out.String())
fmt.Println("")
}
func stream() {
fmt.Println("==== RUNNING DECOMPRESSION OF SEPARATE CHUNKS")
fmt.Println("")
pRead, pWrite := io.Pipe()
go func() {
buf, readErr := os.ReadFile("./chunks/0.in")
if readErr != nil {
log.Fatalf("readErr=%v\n", readErr)
}
fmt.Printf("Read 0 bytes, n=%v\n", len(buf))
written0, copy0Err := io.Copy(pWrite, bytes.NewBuffer(buf))
if copy0Err != nil {
log.Printf("copy0Err=%v\n", copy0Err)
}
fmt.Printf("Written compressed bytes, n0=%v", written0)
buf, readErr = os.ReadFile("./chunks/1.in")
if readErr != nil {
log.Fatalf("read1Err=%v\n", readErr)
}
fmt.Printf("Read 1 bytes, n=%v\n", len(buf))
written1, copy1Err := io.Copy(pWrite, bytes.NewBuffer(buf))
if copy1Err != nil {
log.Printf("copy1Err=%v\n", copy1Err)
}
fmt.Printf("Written compressed bytes, n1=%v", written1)
pWrite.Close()
}()
zlibReader, zlibErr := zlib.NewReader(pRead)
if zlibErr != nil {
log.Fatalf("zlibErr=%v\n", zlibErr)
}
out := new(bytes.Buffer)
written2, copy2Err := io.Copy(out, zlibReader)
if copy2Err != nil {
log.Printf("copy2Err=%v\n", copy2Err)
}
fmt.Printf("Written decompressed bytes, n0=%v, out:\n%v\n", written2, out.String())
fmt.Println("")
}
使用这段代码,我从stream()
中没有得到任何错误,但是从all()
中仍然得到了copyErr=unexpected EOF
错误,看起来all.in
在末尾缺少校验数据,但我认为这只是一个意外。
英文:
Notice that error is saying that the data at an offset after your input is corrupt. That is because of the way your are reading from the files:
buf := make([]byte, 100000)
n, readErr := f.Read(buf)
if readErr != nil {
log.Fatalf("readErr=%v\n", readErr)
}
fmt.Printf("Read bytes, n=%v\n", n)
buffer := bytes.NewBuffer(buf)
zlibReader, zlibErr := zlib.NewReader(buffer)
if zlibErr != nil {
log.Fatalf("zlibErr=%v\n", zlibErr)
}
buf := make([]byte, 100000)
will make a slice of 100000 bytes, all of them 0. But you are only reading 443 bytes in the case of all.in
. Since you never shorten the slice, the reader will encounter a few thousand zeros after the valid data and conclude it is corrupt. That is why you get output and an error.
As for streaming. In the case of a TCP/UDP connection you should be able to just pass the connection which is a io.Reader
to the zlib.NewReader
. To simulate the same I used an io.Pipe in the modified code:
package main
import (
"bytes"
"compress/zlib"
"fmt"
"io"
"log"
"os"
otherzlib "github.com/4kills/go-zlib"
)
func main() {
all()
stream()
// Alas it hangs :(
// otherZlib()
}
func all() {
fmt.Println("==== RUNNING DECOMPRESSION OF all.in")
fmt.Println("")
buf, readErr := os.ReadFile("./chunks/all.in")
if readErr != nil {
log.Fatalf("readErr=%v\n", readErr)
}
fmt.Printf("Read bytes, n=%v\n", len(buf))
buffer := bytes.NewBuffer(buf)
zlibReader, zlibErr := zlib.NewReader(buffer)
if zlibErr != nil {
log.Fatalf("zlibErr=%v\n", zlibErr)
}
out := new(bytes.Buffer)
written, copyErr := io.Copy(out, zlibReader)
if copyErr != nil {
log.Printf("copyErr=%v\n", copyErr)
}
fmt.Printf("Written bytes, n=%v, out:\n%v\n", written, out.String())
fmt.Println("")
}
func stream() {
fmt.Println("==== RUNNING DECOMPRESSION OF SEPARATE CHUNKS")
fmt.Println("")
pRead, pWrite := io.Pipe()
go func() {
buf, readErr := os.ReadFile("./chunks/0.in")
if readErr != nil {
log.Fatalf("readErr=%v\n", readErr)
}
fmt.Printf("Read 0 bytes, n=%v\n", len(buf))
written0, copy0Err := io.Copy(pWrite, bytes.NewBuffer(buf))
if copy0Err != nil {
log.Printf("copy0Err=%v\n", copy0Err)
}
fmt.Printf("Written compressed bytes, n0=%v", written0)
buf, readErr = os.ReadFile("./chunks/1.in")
if readErr != nil {
log.Fatalf("read1Err=%v\n", readErr)
}
fmt.Printf("Read 1 bytes, n=%v\n", len(buf))
written1, copy1Err := io.Copy(pWrite, bytes.NewBuffer(buf))
if copy1Err != nil {
log.Printf("copy1Err=%v\n", copy1Err)
}
fmt.Printf("Written compressed bytes, n1=%v", written1)
pWrite.Close()
}()
zlibReader, zlibErr := zlib.NewReader(pRead)
if zlibErr != nil {
log.Fatalf("zlibErr=%v\n", zlibErr)
}
out := new(bytes.Buffer)
written2, copy2Err := io.Copy(out, zlibReader)
if copy2Err != nil {
log.Printf("copy2Err=%v\n", copy2Err)
}
fmt.Printf("Written decompressed bytes, n0=%v, out:\n%v\n", written2, out.String())
fmt.Println("")
}
With this code I get no errors from stream()
but I still get a copyErr=unexpected EOF
error from all()
, looks like the all.in
is missing checksum data at the end, but I figure that is just an accident.
答案2
得分: 0
通过仔细调试,我发现我错误地传递了过大的缓冲区切片,导致了错误的输入缓冲区被传递给解压缩。
此外,重要的是不要使用io.Copy
,因为它会导致缓冲区上的EOF,从而停止一切操作,而是使用zlibReader.Read()
,它将解压缩当前缓冲区中的所有内容。
我已经更新了代码,现在它按预期工作。
英文:
With careful debugging I was able to see that I have incorrectly passed too large buffer slices which lead to incorrect input buffers being fed to decompression.
Also, it is important not to use io.Copy
, which leads to EOF on buffer which stops everything, and instead to use just zlibReader.Read(), which will decompress everything that is currently in buffer now.
I've updated the code so it now works as expected.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论