在Go语言中压缩读取器的数据

huangapple go评论85阅读模式
英文:

Compressing data from a reader in Go

问题

这个程序出现死锁的原因是在compress函数中,io.Copy方法将数据从data读取并写入到gw(gzip.Writer)中,但是io.Copy方法是阻塞的,直到所有数据都被写入到gw中。然而,gw是通过io.Pipe创建的,而io.Pipe是一个同步的管道,它需要读取端和写入端同时被使用,否则会导致死锁。

在这种情况下,io.Copy方法阻塞在写入端,等待有其他的goroutine从读取端读取数据。然而,在compress函数中并没有启动一个新的goroutine来读取数据,因此导致了死锁。

为了解决这个问题,你可以在compress函数中启动一个新的goroutine来读取数据,然后在主goroutine中等待读取完成。这样可以避免死锁的发生。以下是修改后的代码:

func compress(data io.Reader) (io.Reader, error) {
    pr, pw := io.Pipe()
    gw := gzip.NewWriter(pw)

    go func() {
        defer gw.Close()
        defer pw.Close()

        _, err := io.Copy(gw, data)
        if err != nil {
            fmt.Printf("error: %s", err.Error())
        }
    }()

    return pr, nil
}

这样修改后,compress函数会启动一个新的goroutine来读取数据,并在读取完成后关闭gwpw。主goroutine会立即返回pr,这样你就可以从pr中读取压缩后的数据。

关于从io.Reader中最高效地压缩数据的方法,你已经使用了compress/gzip包,这是一个很好的选择。如果你对压缩率和速度有更高的要求,可以考虑使用其他的压缩算法,如compress/zlib包提供的zlib.Writer。不同的压缩算法可能在不同的数据集上表现更好,你可以根据实际情况进行测试和比较。

英文:

I have the following short program written in Go, which attempts to transparently compress the data in a reader (https://play.golang.org/p/SnvYT6it5r):

package main

import (
	"fmt"
	"io"
	"bytes"
	"compress/gzip"
)

func main() {
	data := bytes.NewReader([]byte("hello world"))
	compress(data)
}

func compress(data io.Reader) (io.Reader, error) {
	pr, pw := io.Pipe()
	gw := gzip.NewWriter(pw)
	
	n, err := io.Copy(gw, data)

	if err != nil {
		fmt.Printf("error: %s", err.Error())
	} else {
		fmt.Printf("%d bytes compressed", n)
	}
	return pr, err
}

When I run it, the program hangs:

fatal error: all goroutines are asleep - deadlock!

goroutine 1 [semacquire]:
sync.runtime_notifyListWait(0x1043e6cc, 0x0)
	/usr/local/go/src/runtime/sema.go:297 +0x140
sync.(*Cond).Wait(0x1043e6c4, 0x137118)
	/usr/local/go/src/sync/cond.go:57 +0xc0
io.(*pipe).write(0x1043e680, 0x1045a055, 0xa, 0xa, 0x0, 0x0, 0x0, 0x101)
	/usr/local/go/src/io/pipe.go:90 +0x1a0
io.(*PipeWriter).Write(0x1040c180, 0x1045a055, 0xa, 0xa, 0xe205ef63, 0x34c, 0x0, 0x0)
	/usr/local/go/src/io/pipe.go:157 +0x40
compress/gzip.(*Writer).Write(0x1045a000, 0x1040a130, 0xb, 0x10, 0x2c380, 0x7654, 0x1059e0, 0x111480)
	/usr/local/go/src/compress/gzip/gzip.go:168 +0x2e0
bytes.(*Reader).WriteTo(0x10440240, 0x190610, 0x1045a000, 0x0, 0xfef64000, 0x10440240, 0x1045a001, 0x190670)
	/usr/local/go/src/bytes/reader.go:134 +0xe0
io.copyBuffer(0x190610, 0x1045a000, 0x1905d0, 0x10440240, 0x0, 0x0, 0x0, 0x106620, 0x1045a000, 0x0, ...)
	/usr/local/go/src/io/io.go:380 +0x360
io.Copy(0x190610, 0x1045a000, 0x1905d0, 0x10440240, 0x10440240, 0x0, 0x1a47c0, 0x0)
	/usr/local/go/src/io/io.go:360 +0x60
main.compress(0x1905d0, 0x10440240, 0x10440240, 0x1040c170, 0x1040a130, 0xb)
	/tmp/sandbox403912545/main.go:19 +0x180
main.main()
	/tmp/sandbox403912545/main.go:12 +0xe0

What is causing the deadlock, and what is the most efficient way to compress data from a reader?

答案1

得分: 3

你向io.Pipe写入数据,但你从未从中读取(在一个并行的go例程中),因此导致了死锁。以下是文档中的说明:

管道上的读取和写入是一对一匹配的,除非需要多个读取来消耗单个写入。也就是说,每次写入到PipeWriter都会阻塞,直到满足一个或多个从PipeReader读取的要求,这些读取完全消耗了写入的数据。数据直接从写入到相应的读取(或读取)中进行复制;没有内部缓冲。

https://golang.org/pkg/io/#Pipe

英文:

You write to io.Pipe but you never read from it (in a parallel go routine), hence the deadlock. Here is what the docs say:

>Reads and Writes on the pipe are matched one to one except when multiple Reads are needed to consume a single Write. That is, each Write to the PipeWriter blocks until it has satisfied one or more Reads from the PipeReader that fully consume the written data. The data is copied directly from the Write to the corresponding Read (or Reads); there is no internal buffering.

https://golang.org/pkg/io/#Pipe

huangapple
  • 本文由 发表于 2017年7月16日 04:35:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/45122513.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定