如何向gzip写入器添加缓冲区?

huangapple go评论81阅读模式
英文:

How should I add buffering to a gzip writer?

问题

我注意到gzip包在读取gzip文件时内部使用了bufio,但在写入gzip文件时没有使用它。我知道缓冲对于I/O性能很重要,那么如何正确地对gzip writer进行缓冲呢?

// 忽略了错误处理的部分
outFile, _ := os.Create("output.gz")

// 方案1 - bufio.Writer 包装 gzip.Writer
gzipWriter, _ := gzip.NewWriter(outFile)
writer := bufio.NewWriter(gzipWriter)

// 方案2 - gzip.Writer 包装 bufio.Writer
writer := bufio.NewWriter(outFile)
gzipWriter, _ := gzip.NewWriter(writer)

// 方案3 - 用 bytes.Buffer 替换 bufio
buf := bytes.NewBuffer()
gzipWriter, _ := gzip.NewWriter(&buf)

另外,在关闭gzip writer之前,我需要调用Flush()方法来刷新gzip writer或bufio writer(或两者都需要),还是关闭时会自动刷新writer?

更新:我现在明白了,gzip既对读取又对写入进行了缓冲。因此,对gzip.Writer进行缓冲实际上是双重缓冲。@peterSO认为这是多余的。@Steven Weinberg认为双重缓冲可能会减少系统调用的次数,但建议进行基准测试以确保。

英文:

I noticed the gzip package uses bufio internally for reading gzipped files, but not for writing them. I know that buffering is important for I/O performance, so what is the proper way to buffer a gzip writer?

// ignoring error handling for this example
outFile, _ := os.Create("output.gz")

// Alternative 1 - bufio.Writer wraps gzip.Writer
gzipWriter, _ := gzip.NewWriter(outFile)
writer, _ := bufio.NewWriter(gzipWriter)

// Alternative 2 - gzip.Writer wraps bufio.Writer
writer, _ :=  bufio.NewWriter(outFile)
gzipWriter, _ := gzip.NewWriter(writer)

// Alternative 3 - replace bufio with bytes.Buffer
buf := bytes.NewBuffer()
gzipWriter, _ := gzip.NewWriter(&buf)

Also, do I need to Flush() the gzip writer or the bufio writer (or both) before closing it, or will closing it automatically flush the writer?

UPDATE: I now understand that both reads and writes are buffered with gzip. So buffering a gzip.Writer is really double buffering. @peterSO thinks this is redundant. @Steven Weinberg thinks double buffering may reduce the number of syscalls, but suggests benchmarking to be sure.

答案1

得分: 5

使用bufio的正确方式是为每次写入调用都包装一个开销较高的写入器。对于任何需要系统调用的写入器都是如此。在这种情况下,你的"outFile"是一个操作系统文件,每次写入都是一个系统调用。

outFile, err := os.Create("output.gz")
defer outFile.Close()

buf := bufio.NewWriter(outFile)
defer buf.Flush()

gz := gzip.NewWriter(buf)
defer gz.Close()

io.Copy(gz, src)
return

在这种情况下,我们使用bufio将写入操作分组到outFile中,以避免不必要的系统调用。顺序是src -> gzip -> buffer -> file。

现在,当我们完成写入时,我们有多个需要关闭的缓冲区。我们需要告诉gzip我们已经完成了,这样它就可以刷新其缓冲区并将最终信息写入缓冲区。然后,我们需要告诉bufio.Writer我们已经完成了,这样它就可以写出它保存用于下一批写入的内部缓冲区。最后,我们需要告诉操作系统我们已经完成了对文件的使用。

这个销毁过程与创建过程相反,所以我们可以使用defer来简化它。在返回时,defer语句按相反的顺序执行,因此我们知道我们按正确的顺序刷新,因为销毁的defer语句紧跟在创建的函数调用旁边。

英文:

The proper way to use bufio is to wrap a writer with a high overhead for each call to write. This is the case for any writer that requires syscalls. In this case, your "outFile" is an OS file and each write is a syscall.

outFile, err := os.Create("output.gz")
defer outFile.Close()

buf := bufio.NewWriter(outFile)
defer buf.Flush()

gz := gzip.NewWriter(buf)
defer gz.Close()

io.Copy(gz, src)
return

In this case, we are grouping writes to outFile with bufio so as to avoid unnecessary syscalls. The order is src -> gzip -> buffer -> file.

Now, when we finish writing, we have multiple buffers that need to be closed. We need to tell gzip we are done so it can flush its buffers and write final information to the buffer.
Then we need to tell bufio.Writer we are done so it can write out its internal buffers that it was saving for the next batch write. Finally, we need to tell the OS we are done with the file.

This destruction happens in the opposite order of creation, so we can use defers to make it easier. On return, the defers are executed in reverse order so we know we are flushing in the proper order because the defers for destruction are right next to the function calls for creation.

huangapple
  • 本文由 发表于 2014年8月7日 06:35:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/25171385.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定