2022年2月11日 22:47:37go评论97阅读模式

英文:

What exactly is buffer (last parameter) in io.Copybuffer(...)?

问题

我理解在使用io.Copy时，重复使用缓冲区比每次分配都方便。然而，我多次打印其值时，得到的都是零，并且我的缓冲区大小从未改变。我尝试将大小设置为8和1。

另外，我应该将缓冲区大小设置为多少呢？

英文:

I understand it's handy in reusing a buffer rather than allocating every time hen using io.Copy. However, having printed its value several times, I get all zeros and the size of my buffer never changes. I tried to set the size to 8 and 1.

On a related note, to what value should I set my buffer size?

答案1

得分: 1

io.CopyBuffer()文档说明了以下内容：

> func CopyBuffer(dst Writer, src Reader, buf []byte) (written int64, err error)
>
> CopyBuffer与Copy相同，只是它通过提供的缓冲区（如果需要）进行分段，而不是分配临时缓冲区。如果buf为nil，则会分配一个缓冲区；否则，如果它的长度为零，CopyBuffer会引发panic。
>
> 如果src实现了WriterTo或dst实现了ReaderFrom，则不会使用buf执行复制操作。

因此，io.CopyBuffer()将数据（字节）从src复制到dst。源是io.Reader，目标是io.Writer。这些接口允许您读取和写入字节切片（[]byte）。

在一般情况下，为了进行复制，我们需要一个从源读取的切片，我们可以将其写入目标。因此，io.CopyBuffer()需要一个缓冲区。如果您已经有一个字节切片，buf参数允许您传递它，如果这样做，该缓冲区将用于执行任务，因此不需要分配新的切片（在操作结束时会被丢弃）。

它应该是多大？越大越好，但不需要比要复制的数据更大。显然，更大的缓冲区需要更多的内存，所以存在权衡。通常几KB是一个很好的折衷方案。

请注意，正如文档所述，如果源实现了io.WriterTo或目标实现了io.ReaderFrom，这些接口允许在不传递切片的情况下进行读取/写入，因此在这种情况下，您传递的缓冲区将不会被使用。就像在这个例子中一样：

srcData := []byte{1, 2, 3, 4, 5, 6, 7}
src := bytes.NewBuffer(srcData)
dst := &amp;bytes.Buffer{}
buf := make([]byte, 10)

io.CopyBuffer(dst, src, buf)

fmt.Println(srcData)
fmt.Println(dst.Bytes())
fmt.Println(buf)

输出结果为（在Go Playground上尝试）：

[1 2 3 4 5 6 7]
[1 2 3 4 5 6 7]
[0 0 0 0 0 0 0 0 0 0]

由于我们将bytes.Buffer用作源和目标（并且由于它同时实现了io.ReaderFrom和io.WriterTo），因此不使用缓冲区。

让我们构造一个不实现这些接口的源和目标，以便我们可以测试我们传递的缓冲区是否被使用。

为此，我将在结构体中嵌入*bytes.Buffer，但指定一个WriteTo和ReadFrom字段，以便这些方法不会从嵌入的bytes.Buffer中继承：

srcData := []byte{1, 2, 3, 4, 5, 6, 7}
src := struct {
	WriteTo int // "禁用"WriteTo方法
	*bytes.Buffer
}{0, bytes.NewBuffer(srcData)}

dst := struct {
	ReadFrom int // "禁用"ReadFrom方法
	*bytes.Buffer
}{0, &amp;bytes.Buffer{}}

buf := make([]byte, 10)

io.CopyBuffer(dst, src, buf)

fmt.Println(srcData)
fmt.Println(dst.Bytes())
fmt.Println(buf)

这将输出（在Go Playground上尝试）：

[1 2 3 4 5 6 7]
[1 2 3 4 5 6 7]
[1 2 3 4 5 6 7 0 0 0]

正如您所看到的，源的数据被读入缓冲区，然后写入目标。

请注意，您可以传递比要复制的数据更小的缓冲区，在这种情况下，读取/写入将在多个迭代中进行。在这种情况下，缓冲区中的数据可能仅包含最后一次迭代，并且可能仅包含部分数据（如果复制的大小不是缓冲区大小的整数倍）。它还取决于源上的Read()方法的实现方式，因为Read()不需要读取传递给它的完整切片。

还请注意，io.CopyBuffer()没有记录写入传递的缓冲区的数据是否被保留，它可能会被清除/清零。尽管出于性能原因，不会执行此清除操作，但您不应指望在io.CopyBuffer()返回后它仍然保存有效数据。

英文:

io.CopyBuffer() documents that:

> func CopyBuffer(dst Writer, src Reader, buf []byte) (written int64, err error)
>
> CopyBuffer is identical to Copy except that it stages through the provided buffer (if one is required) rather than allocating a temporary one. If buf is nil, one is allocated; otherwise if it has zero length, CopyBuffer panics.
>
> If either src implements WriterTo or dst implements ReaderFrom, buf will not be used to perform the copy.

So io.CopyBuffer() copies data (bytes) from src to dst. The source is an io.Reader and the destination is an io.Writer. These interfaces allow you to read and write slices of bytes ([]byte).

In the general case to do the copying, we need a slice to read into from the source, which we can write into the destination. So io.CopyBuffer() needs a buffer. The buf param allows you to pass a byte slice if you already have one, and if you do so, that buffer will be used to do the job, so no new slice have to be allocated (which would be thrown away at the end of the operation).

What size should it be? The bigger the better, but no bigger is needed than the data you want to copy. Obviously bigger requires more memory, so there's a trade-off. Typically a few KB is a good compromise.

Note that as documented, if the source implements io.WriterTo or the destination implements io.ReaderFrom, those interfaces allow to read /
write without having to pass a slice, so in that case the buffer you pass will not be used. Like in this example:

srcData := []byte{1, 2, 3, 4, 5, 6, 7}
src := bytes.NewBuffer(srcData)
dst := &amp;bytes.Buffer{}
buf := make([]byte, 10)

io.CopyBuffer(dst, src, buf)

fmt.Println(srcData)
fmt.Println(dst.Bytes())
fmt.Println(buf)

Which outputs (try it on the Go Playground):

[1 2 3 4 5 6 7]
[1 2 3 4 5 6 7]
[0 0 0 0 0 0 0 0 0 0]

Since we used bytes.Buffer as the source and destination (and since it implements both io.ReaderFrom and io.WriterTo), the buffer is not used.

Let's construct a source and destination that does not implement these interfaces, so we can test if / how our passed buffer is used.

For this, I will embed *bytes.Buffer in a struct, but specify a WriteTo and ReadFrom fields, so those methods will not get promoted from the embedded bytes.Buffer:

srcData := []byte{1, 2, 3, 4, 5, 6, 7}
src := struct {
	WriteTo int // &quot;disable&quot; WriteTo method
	*bytes.Buffer
}{0, bytes.NewBuffer(srcData)}

dst := struct {
	ReadFrom int // &quot;disable&quot; ReadFrom method
	*bytes.Buffer
}{0, &amp;bytes.Buffer{}}

buf := make([]byte, 10)

io.CopyBuffer(dst, src, buf)

fmt.Println(srcData)
fmt.Println(dst.Bytes())
fmt.Println(buf)

This will output (try it on the Go Playground):

[1 2 3 4 5 6 7]
[1 2 3 4 5 6 7]
[1 2 3 4 5 6 7 0 0 0]

As you can see, the data from the source was read into the buffer, which then was written to the destination.

Note that you may pass a buffer smaller than the data to be copied, in which case reading / writing will be done in several iterations. In such cases, the data in the buffer may hold only the last iteration, and may only hold partial data (if the copied size is not an integer multiplication of the buffer size). It also depends on how the Read() method is implemented on the source, as Read() is not required to read the full slice passed to it.

Also note that io.CopyBuffer() does not document that the data written to the passed buffer is retained, it may get cleared / zeroed. Although this clearing is not implemented for performance reasons, but you should not count on it holding valid data after io.CopyBuffer() returns.

答案2

得分: -1

在使用Go语言的io.Copy函数时，提供一个缓冲区可以通过减少每次读写操作所需的系统调用次数来提高性能。然而，缓冲区的大小并不决定将要复制的数据的大小，而是影响复制过程的效率。

选择缓冲区的大小通常基于预期的输入/输出大小和底层系统的特性。选择缓冲区大小没有固定的规则，因为它取决于各种因素，如正在处理的数据的性质、可用内存和特定用例的性能要求。

如果缓冲区大小太小，可能会导致频繁的缓冲区刷新，降低潜在的性能提升。另一方面，如果缓冲区大小太大，可能会导致不必要的内存消耗。

为了确定合适的缓冲区大小，可以考虑以下准则：

从一个合理的默认大小开始，例如4096（4 KB），这是一个常见的选择。
使用不同的缓冲区大小来测量代码的性能。可以使用类似Go的测试包或基准测试工具来比较执行时间和资源利用情况。
根据结果调整缓冲区大小。如果增加缓冲区大小可以提高性能，可以尝试更大的值。如果减小缓冲区大小没有显著影响，可以尝试更小的值。

请记住，缓冲区的大小与要复制的数据的大小没有直接关系，而是影响复制过程的效率。通过实验和性能分析，可以帮助确定特定用例的最佳缓冲区大小。

英文:

When using io.Copy in Go, providing a buffer can improve performance by reducing the number of system calls needed for each read and write operation. However, the buffer size does not determine the size of the data that will be copied. Instead, the buffer size affects the efficiency of the copying process.

The buffer size is typically chosen based on the expected input/output size and the characteristics of the underlying system. There is no fixed rule for selecting the buffer size, as it depends on various factors such as the nature of the data being processed, the available memory, and the performance requirements of your specific use case.

If the buffer size is too small, it may result in frequent buffer flushes and reduce the potential performance gains. On the other hand, if the buffer size is too large, it may lead to unnecessary memory consumption.

To determine an appropriate buffer size, you can consider the following guidelines:

Start with a reasonable default size, such as 4096 (4 KB), which is a common choice.
Measure the performance of your code with different buffer sizes. You can use tools like Go's testing package or benchmarking utilities to compare the execution time and resource utilization.
Adjust the buffer size based on the results. If increasing the buffer size improves performance, you can try larger values. If decreasing it has no significant impact, you can try smaller values.

Remember that the buffer size is not directly related to the size of the data being copied, but rather affects the efficiency of the copying process. Experimentation and performance profiling can help you determine the optimal buffer size for your specific use case.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。