在Mac OS X上的Go 1.5版本中,无法对超过32768字节的切片进行gzip压缩。

huangapple go评论85阅读模式
英文:

Cannot gzip slices with more than 32768 bytes in Go 1.5 on Mac OS X

问题

我正在尝试使用compress/gzip在Go语言中压缩字节切片。每当我在我的笔记本上压缩长度大于2^15的切片时,解压缩后索引大于2^15的每个字节都被设置为0。当我在研究集群上运行相同的代码时,也会出现相同的问题。

在我的笔记本上运行go version命令输出:

$ go version
go version go1.5 darwin/amd64

在集群上运行go version命令输出:

$ go version
go version go1.3.3 linux/amd64

下面是我编写的一个演示性测试文件。它生成不同长度的随机切片,对其进行压缩,然后解压缩。它检查是否有调用返回错误,并检查压缩和解压缩后的切片是否相同:

package compress

import (
	"bytes"
	"compress/gzip"
	"math/rand"
	"testing"
)

func byteSliceEq(xs, ys []byte) bool {
	if len(xs) != len(ys) { return false }
	for i := range xs {
		if xs[i] != ys[i] { return false }
	}
	return true
}

func TestGzip(t *testing.T) {
	tests := []struct {
		n int
	}{
		{ 1<<10 },
		{ 1<<15 },
		{ 1<<15 + 1 },
		{ 1<<20 },

	}

	rand.Seed(0)

	for i := range tests {
		n := tests[i].n

		in, out := make([]byte, n), make([]byte, n)
		buf := &bytes.Buffer{}
		for i := range in { in[i] = byte(rand.Intn(256)) }

		writer := gzip.NewWriter(buf)
		_, err := writer.Write(in)
		if err != nil {
			t.Errorf("%d) n = %d: writer.Write() error: %s",
				i + 1, n, err.Error())
		}
		err = writer.Close()
		if err != nil {
			t.Errorf("%d) n = %d: writer.Close() error: %s",
				i + 1, n, err.Error())
		}

		reader, err := gzip.NewReader(buf)
		if err != nil {
			t.Errorf("%d) n = %d: gzip.NewReader error: %s",
				i + 1, n, err.Error())
		}
		reader.Read(out)
		err = reader.Close()
		if err != nil {
			t.Errorf("%d) n = %d: reader.Close() error: %s",
				i + 1, n, err.Error())
		}

		if !byteSliceEq(in, out) {
			idx := -1
			for i := range in {
				if in[i] != out[i] {
					idx = i
					break
				}
			}
			t.Errorf("%d) n = %d: in[%d] = %d, but out[%d] = %d",
				i + 1, n, idx, in[idx], idx, out[idx])
		}
	}
}

当我运行这个测试时,我得到以下输出:

$ go test --run "TestGzip"
--- FAIL: TestGzip (0.12s)
	gzip_test.go:77: 3) n = 32769: in[32768] = 78, but out[32768] = 0
	gzip_test.go:77: 4) n = 1048576: in[32768] = 229, but out[32768] = 0
FAIL
exit status 1

有人知道这里发生了什么吗?我是否以某种方式错误使用了该包?如果我没有提供足够的信息,请告诉我。

英文:

I am trying to compress byte slices in Go using compress/gzip. Whenever I compress slices with lengths longer than 2^15 on my laptop, every byte with an index of 2^15 or greater is set to 0 after decompression. When I run the same code on my research cluster it also breaks.

Calling go version on my laptop prints:

$ go version
go version go1.5 darwin/amd64

Calling go version on the cluster prints:

$ go version
go version go1.3.3 linux/amd64

Below is a demonstrative test file that I wrote. It generates random slices of different lengths, compresses them, then decompresses them. It checks that no calls returns errors and also checks that the compressed and decompressed slices are the same:

package compress

import (
	&quot;bytes&quot;
	&quot;compress/gzip&quot;
	&quot;math/rand&quot;
	&quot;testing&quot;
)

func byteSliceEq(xs, ys []byte) bool {
	if len(xs) != len(ys) { return false }
	for i := range xs {
		if xs[i] != ys[i] { return false }
	}
	return true
}

func TestGzip(t *testing.T) {
	tests := []struct {
		n int
	}{
		{ 1&lt;&lt;10 },
		{ 1&lt;&lt;15 },
		{ 1&lt;&lt;15 + 1 },
		{ 1&lt;&lt;20 },

	}

	rand.Seed(0)

	for i := range tests {
		n := tests[i].n

		in, out := make([]byte, n), make([]byte, n)
		buf := &amp;bytes.Buffer{}
		for i := range in { in[i] = byte(rand.Intn(256)) }

		writer := gzip.NewWriter(buf)
		_, err := writer.Write(in)
		if err != nil {
			t.Errorf(&quot;%d) n = %d: writer.Write() error: %s&quot;,
				i + 1, n, err.Error())
		}
		err = writer.Close()
		if err != nil {
			t.Errorf(&quot;%d) n = %d: writer.Close() error: %s&quot;,
				i + 1, n, err.Error())
		}

		reader, err := gzip.NewReader(buf)
		if err != nil {
			t.Errorf(&quot;%d) n = %d: gzip.NewReader error: %s&quot;,
				i + 1, n, err.Error())
		}
		reader.Read(out)
		err = reader.Close()
		if err != nil {
			t.Errorf(&quot;%d) n = %d: reader.Close() error: %s&quot;,
				i + 1, n, err.Error())
		}

		if !byteSliceEq(in, out) {
			idx := -1
			for i := range in {
				if in[i] != out[i] {
					idx = i
					break
				}
			}
			t.Errorf(&quot;%d) n = %d: in[%d] = %d, but out[%d] = %d&quot;,
				i + 1, n, idx, in[idx], idx, out[idx])
		}
	}
}

When I run this test, I get the following output:

$ go test --run &quot;TestGzip&quot;
--- FAIL: TestGzip (0.12s)
	gzip_test.go:77: 3) n = 32769: in[32768] = 78, but out[32768] = 0
	gzip_test.go:77: 4) n = 1048576: in[32768] = 229, but out[32768] = 0
FAIL
exit status 1

Does anyone know what is going on here? Am I misusing the package in some way? Let me know if I haven't given enough information.

答案1

得分: 5

问题出在这一行代码上:

reader.Read(out)

不能保证Reader.Read()会一次性读取整个out切片。

gzip.Reader.Read()是实现io.Reader.Read()的方法。引用自其文档(“一般契约”):

> Read(p []byte) (n int, err error)
>
> Read将最多读取len(p)字节到p中。

不能保证Reader.Read()会一直读取直到out被填满,如果实现希望如此,它可能会在较少的字节处停止(即使未达到EOF)。如果传递了一个“大”的切片,如果实现的内部缓存耗尽,这种情况很容易发生。Read()返回读取的字节数(和一个error),您可以使用它来检查是否已读取完整的切片。

或者更好的是,您可以使用io.ReadFull()来确保完全读取out

if _, err = io.ReadFull(reader, out); err != nil {
    t.Errorf(&quot;Error reading full out slice:&quot;, err)
}

通过应用这个更改,您的测试通过了。

英文:

The problem is in this line:

reader.Read(out)

There is no guarantee that Reader.Read() will read the whole out slice in one step.

gzip.Reader.Read() is to implement io.Reader.Read().
Quoting from its doc (the "general contract"):

> Read(p []byte) (n int, err error)
>
> Read reads up to len(p) bytes into p.

There is no guarantee that Reader.Read() will read until out is filled, it may stop at fewer bytes if the implementation wishes so (even if EOF is not reached). If you pass a "big" slice, this may easily happen if an internal cache of the implementation is exhausted. Read() returns the number of read bytes (and an error), you may use that to check if the full slice was read.

Or even better, instead you may use io.ReadFull() to make sure out is read fully:

if _, err = io.ReadFull(reader, out); err != nil {
    t.Errorf(&quot;Error reading full out slice:&quot;, err)
}

By applying this change, your test passes.

huangapple
  • 本文由 发表于 2016年3月23日 03:32:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/36163519.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定