英文:
Cannot gzip slices with more than 32768 bytes in Go 1.5 on Mac OS X
问题
我正在尝试使用compress/gzip
在Go语言中压缩字节切片。每当我在我的笔记本上压缩长度大于2^15的切片时,解压缩后索引大于2^15的每个字节都被设置为0。当我在研究集群上运行相同的代码时,也会出现相同的问题。
在我的笔记本上运行go version
命令输出:
$ go version
go version go1.5 darwin/amd64
在集群上运行go version
命令输出:
$ go version
go version go1.3.3 linux/amd64
下面是我编写的一个演示性测试文件。它生成不同长度的随机切片,对其进行压缩,然后解压缩。它检查是否有调用返回错误,并检查压缩和解压缩后的切片是否相同:
package compress
import (
"bytes"
"compress/gzip"
"math/rand"
"testing"
)
func byteSliceEq(xs, ys []byte) bool {
if len(xs) != len(ys) { return false }
for i := range xs {
if xs[i] != ys[i] { return false }
}
return true
}
func TestGzip(t *testing.T) {
tests := []struct {
n int
}{
{ 1<<10 },
{ 1<<15 },
{ 1<<15 + 1 },
{ 1<<20 },
}
rand.Seed(0)
for i := range tests {
n := tests[i].n
in, out := make([]byte, n), make([]byte, n)
buf := &bytes.Buffer{}
for i := range in { in[i] = byte(rand.Intn(256)) }
writer := gzip.NewWriter(buf)
_, err := writer.Write(in)
if err != nil {
t.Errorf("%d) n = %d: writer.Write() error: %s",
i + 1, n, err.Error())
}
err = writer.Close()
if err != nil {
t.Errorf("%d) n = %d: writer.Close() error: %s",
i + 1, n, err.Error())
}
reader, err := gzip.NewReader(buf)
if err != nil {
t.Errorf("%d) n = %d: gzip.NewReader error: %s",
i + 1, n, err.Error())
}
reader.Read(out)
err = reader.Close()
if err != nil {
t.Errorf("%d) n = %d: reader.Close() error: %s",
i + 1, n, err.Error())
}
if !byteSliceEq(in, out) {
idx := -1
for i := range in {
if in[i] != out[i] {
idx = i
break
}
}
t.Errorf("%d) n = %d: in[%d] = %d, but out[%d] = %d",
i + 1, n, idx, in[idx], idx, out[idx])
}
}
}
当我运行这个测试时,我得到以下输出:
$ go test --run "TestGzip"
--- FAIL: TestGzip (0.12s)
gzip_test.go:77: 3) n = 32769: in[32768] = 78, but out[32768] = 0
gzip_test.go:77: 4) n = 1048576: in[32768] = 229, but out[32768] = 0
FAIL
exit status 1
有人知道这里发生了什么吗?我是否以某种方式错误使用了该包?如果我没有提供足够的信息,请告诉我。
英文:
I am trying to compress byte slices in Go using compress/gzip
. Whenever I compress slices with lengths longer than 2^15 on my laptop, every byte with an index of 2^15 or greater is set to 0 after decompression. When I run the same code on my research cluster it also breaks.
Calling go version
on my laptop prints:
$ go version
go version go1.5 darwin/amd64
Calling go version
on the cluster prints:
$ go version
go version go1.3.3 linux/amd64
Below is a demonstrative test file that I wrote. It generates random slices of different lengths, compresses them, then decompresses them. It checks that no calls returns errors and also checks that the compressed and decompressed slices are the same:
package compress
import (
"bytes"
"compress/gzip"
"math/rand"
"testing"
)
func byteSliceEq(xs, ys []byte) bool {
if len(xs) != len(ys) { return false }
for i := range xs {
if xs[i] != ys[i] { return false }
}
return true
}
func TestGzip(t *testing.T) {
tests := []struct {
n int
}{
{ 1<<10 },
{ 1<<15 },
{ 1<<15 + 1 },
{ 1<<20 },
}
rand.Seed(0)
for i := range tests {
n := tests[i].n
in, out := make([]byte, n), make([]byte, n)
buf := &bytes.Buffer{}
for i := range in { in[i] = byte(rand.Intn(256)) }
writer := gzip.NewWriter(buf)
_, err := writer.Write(in)
if err != nil {
t.Errorf("%d) n = %d: writer.Write() error: %s",
i + 1, n, err.Error())
}
err = writer.Close()
if err != nil {
t.Errorf("%d) n = %d: writer.Close() error: %s",
i + 1, n, err.Error())
}
reader, err := gzip.NewReader(buf)
if err != nil {
t.Errorf("%d) n = %d: gzip.NewReader error: %s",
i + 1, n, err.Error())
}
reader.Read(out)
err = reader.Close()
if err != nil {
t.Errorf("%d) n = %d: reader.Close() error: %s",
i + 1, n, err.Error())
}
if !byteSliceEq(in, out) {
idx := -1
for i := range in {
if in[i] != out[i] {
idx = i
break
}
}
t.Errorf("%d) n = %d: in[%d] = %d, but out[%d] = %d",
i + 1, n, idx, in[idx], idx, out[idx])
}
}
}
When I run this test, I get the following output:
$ go test --run "TestGzip"
--- FAIL: TestGzip (0.12s)
gzip_test.go:77: 3) n = 32769: in[32768] = 78, but out[32768] = 0
gzip_test.go:77: 4) n = 1048576: in[32768] = 229, but out[32768] = 0
FAIL
exit status 1
Does anyone know what is going on here? Am I misusing the package in some way? Let me know if I haven't given enough information.
答案1
得分: 5
问题出在这一行代码上:
reader.Read(out)
不能保证Reader.Read()
会一次性读取整个out
切片。
gzip.Reader.Read()
是实现io.Reader.Read()
的方法。引用自其文档(“一般契约”):
> Read(p []byte) (n int, err error)
>
> Read将最多读取len(p)字节到p中。
不能保证Reader.Read()
会一直读取直到out
被填满,如果实现希望如此,它可能会在较少的字节处停止(即使未达到EOF)。如果传递了一个“大”的切片,如果实现的内部缓存耗尽,这种情况很容易发生。Read()
返回读取的字节数(和一个error
),您可以使用它来检查是否已读取完整的切片。
或者更好的是,您可以使用io.ReadFull()
来确保完全读取out
:
if _, err = io.ReadFull(reader, out); err != nil {
t.Errorf("Error reading full out slice:", err)
}
通过应用这个更改,您的测试通过了。
英文:
The problem is in this line:
reader.Read(out)
There is no guarantee that Reader.Read()
will read the whole out
slice in one step.
gzip.Reader.Read()
is to implement io.Reader.Read()
.
Quoting from its doc (the "general contract"):
> Read(p []byte) (n int, err error)
>
> Read reads up to len(p) bytes into p.
There is no guarantee that Reader.Read()
will read until out
is filled, it may stop at fewer bytes if the implementation wishes so (even if EOF is not reached). If you pass a "big" slice, this may easily happen if an internal cache of the implementation is exhausted. Read()
returns the number of read bytes (and an error
), you may use that to check if the full slice was read.
Or even better, instead you may use io.ReadFull()
to make sure out
is read fully:
if _, err = io.ReadFull(reader, out); err != nil {
t.Errorf("Error reading full out slice:", err)
}
By applying this change, your test passes.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论