2016年3月23日 03:32:05go评论85阅读模式

英文:

Cannot gzip slices with more than 32768 bytes in Go 1.5 on Mac OS X

问题

我正在尝试使用compress/gzip在Go语言中压缩字节切片。每当我在我的笔记本上压缩长度大于2^15的切片时，解压缩后索引大于2^15的每个字节都被设置为0。当我在研究集群上运行相同的代码时，也会出现相同的问题。

在我的笔记本上运行go version命令输出：

$ go version
go version go1.5 darwin/amd64

在集群上运行go version命令输出：

$ go version
go version go1.3.3 linux/amd64

下面是我编写的一个演示性测试文件。它生成不同长度的随机切片，对其进行压缩，然后解压缩。它检查是否有调用返回错误，并检查压缩和解压缩后的切片是否相同：

package compress

import (
	"bytes"
	"compress/gzip"
	"math/rand"
	"testing"
)

func byteSliceEq(xs, ys []byte) bool {
	if len(xs) != len(ys) { return false }
	for i := range xs {
		if xs[i] != ys[i] { return false }
	}
	return true
}

func TestGzip(t *testing.T) {
	tests := []struct {
		n int
	}{
		{ 1<<10 },
		{ 1<<15 },
		{ 1<<15 + 1 },
		{ 1<<20 },

	}

	rand.Seed(0)

	for i := range tests {
		n := tests[i].n

		in, out := make([]byte, n), make([]byte, n)
		buf := &bytes.Buffer{}
		for i := range in { in[i] = byte(rand.Intn(256)) }

		writer := gzip.NewWriter(buf)
		_, err := writer.Write(in)
		if err != nil {
			t.Errorf("%d) n = %d: writer.Write() error: %s",
				i + 1, n, err.Error())
		}
		err = writer.Close()
		if err != nil {
			t.Errorf("%d) n = %d: writer.Close() error: %s",
				i + 1, n, err.Error())
		}

		reader, err := gzip.NewReader(buf)
		if err != nil {
			t.Errorf("%d) n = %d: gzip.NewReader error: %s",
				i + 1, n, err.Error())
		}
		reader.Read(out)
		err = reader.Close()
		if err != nil {
			t.Errorf("%d) n = %d: reader.Close() error: %s",
				i + 1, n, err.Error())
		}

		if !byteSliceEq(in, out) {
			idx := -1
			for i := range in {
				if in[i] != out[i] {
					idx = i
					break
				}
			}
			t.Errorf("%d) n = %d: in[%d] = %d, but out[%d] = %d",
				i + 1, n, idx, in[idx], idx, out[idx])
		}
	}
}

当我运行这个测试时，我得到以下输出：

$ go test --run "TestGzip"
--- FAIL: TestGzip (0.12s)
	gzip_test.go:77: 3) n = 32769: in[32768] = 78, but out[32768] = 0
	gzip_test.go:77: 4) n = 1048576: in[32768] = 229, but out[32768] = 0
FAIL
exit status 1

有人知道这里发生了什么吗？我是否以某种方式错误使用了该包？如果我没有提供足够的信息，请告诉我。

英文:

I am trying to compress byte slices in Go using compress/gzip. Whenever I compress slices with lengths longer than 2^15 on my laptop, every byte with an index of 2^15 or greater is set to 0 after decompression. When I run the same code on my research cluster it also breaks.

Calling go version on my laptop prints:

$ go version
go version go1.5 darwin/amd64

Calling go version on the cluster prints:

$ go version
go version go1.3.3 linux/amd64

Below is a demonstrative test file that I wrote. It generates random slices of different lengths, compresses them, then decompresses them. It checks that no calls returns errors and also checks that the compressed and decompressed slices are the same:

package compress

import (
	&quot;bytes&quot;
	&quot;compress/gzip&quot;
	&quot;math/rand&quot;
	&quot;testing&quot;
)

func byteSliceEq(xs, ys []byte) bool {
	if len(xs) != len(ys) { return false }
	for i := range xs {
		if xs[i] != ys[i] { return false }
	}
	return true
}

func TestGzip(t *testing.T) {
	tests := []struct {
		n int
	}{
		{ 1&lt;&lt;10 },
		{ 1&lt;&lt;15 },
		{ 1&lt;&lt;15 + 1 },
		{ 1&lt;&lt;20 },

	}

	rand.Seed(0)

	for i := range tests {
		n := tests[i].n

		in, out := make([]byte, n), make([]byte, n)
		buf := &amp;bytes.Buffer{}
		for i := range in { in[i] = byte(rand.Intn(256)) }

		writer := gzip.NewWriter(buf)
		_, err := writer.Write(in)
		if err != nil {
			t.Errorf(&quot;%d) n = %d: writer.Write() error: %s&quot;,
				i + 1, n, err.Error())
		}
		err = writer.Close()
		if err != nil {
			t.Errorf(&quot;%d) n = %d: writer.Close() error: %s&quot;,
				i + 1, n, err.Error())
		}

		reader, err := gzip.NewReader(buf)
		if err != nil {
			t.Errorf(&quot;%d) n = %d: gzip.NewReader error: %s&quot;,
				i + 1, n, err.Error())
		}
		reader.Read(out)
		err = reader.Close()
		if err != nil {
			t.Errorf(&quot;%d) n = %d: reader.Close() error: %s&quot;,
				i + 1, n, err.Error())
		}

		if !byteSliceEq(in, out) {
			idx := -1
			for i := range in {
				if in[i] != out[i] {
					idx = i
					break
				}
			}
			t.Errorf(&quot;%d) n = %d: in[%d] = %d, but out[%d] = %d&quot;,
				i + 1, n, idx, in[idx], idx, out[idx])
		}
	}
}

When I run this test, I get the following output:

$ go test --run &quot;TestGzip&quot;
--- FAIL: TestGzip (0.12s)
	gzip_test.go:77: 3) n = 32769: in[32768] = 78, but out[32768] = 0
	gzip_test.go:77: 4) n = 1048576: in[32768] = 229, but out[32768] = 0
FAIL
exit status 1

Does anyone know what is going on here? Am I misusing the package in some way? Let me know if I haven't given enough information.

答案1

得分: 5

问题出在这一行代码上：

reader.Read(out)

不能保证Reader.Read()会一次性读取整个out切片。

gzip.Reader.Read()是实现io.Reader.Read()的方法。引用自其文档（“一般契约”）：

> Read(p []byte) (n int, err error)
>
> Read将最多读取len(p)字节到p中。

不能保证Reader.Read()会一直读取直到out被填满，如果实现希望如此，它可能会在较少的字节处停止（即使未达到EOF）。如果传递了一个“大”的切片，如果实现的内部缓存耗尽，这种情况很容易发生。Read()返回读取的字节数（和一个error），您可以使用它来检查是否已读取完整的切片。

或者更好的是，您可以使用io.ReadFull()来确保完全读取out：

if _, err = io.ReadFull(reader, out); err != nil {
    t.Errorf(&quot;Error reading full out slice:&quot;, err)
}

通过应用这个更改，您的测试通过了。

英文:

The problem is in this line:

reader.Read(out)

There is no guarantee that Reader.Read() will read the whole out slice in one step.

gzip.Reader.Read() is to implement io.Reader.Read().
Quoting from its doc (the "general contract"):

> Read(p []byte) (n int, err error)
>
> Read reads up to len(p) bytes into p.

There is no guarantee that Reader.Read() will read until out is filled, it may stop at fewer bytes if the implementation wishes so (even if EOF is not reached). If you pass a "big" slice, this may easily happen if an internal cache of the implementation is exhausted. Read() returns the number of read bytes (and an error), you may use that to check if the full slice was read.

Or even better, instead you may use io.ReadFull() to make sure out is read fully:

if _, err = io.ReadFull(reader, out); err != nil {
    t.Errorf(&quot;Error reading full out slice:&quot;, err)
}

By applying this change, your test passes.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Mac OS X上的Go 1.5版本中，无法对超过32768字节的切片进行gzip压缩。

问题

答案1

在Go语言中，可以使用集合中只允许一个元素的数据结构。

使用包作为全局变量的存储器。

模板解析错误：模板：：1：操作数中出现意外的“=”

当将Google App Engine应用程序升级到灵活环境时，如何运行Google的aefix工具？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论