写操作成本

huangapple go评论84阅读模式
英文:

Write operation cost

问题

我有一个Go程序,它将字符串写入文件。我有一个循环,循环20000次,在每次迭代中,我都会将大约20-30个字符串写入文件。我只想知道最佳的写入文件方式是什么。

  • 方法1:在代码开始时保持文件指针打开,并为每个字符串进行写入。这将进行20000*30次写操作。

  • 方法2:使用Go的bytes.Buffer,并将所有内容存储在缓冲区中,在最后进行写入。在这种情况下,文件指针应该在代码开始时打开还是在代码结束时打开?这有关系吗?

我认为方法2应该效果更好。有人可以给出一个理由来确认这一点吗?为什么一次性写入比定期写入更好?因为文件指针无论如何都会保持打开。
我正在使用f.WriteString(<string>)buffer.WriteString(<some string>),其中buffer的类型是bytes.Buffer,而f是打开的文件指针。

英文:

I have a Go program which writes strings into a file.I have a loop which is iterated 20000 times and in each iteration i am writing around 20-30 strings into a file. I just wanted to know which is the best way to write it into a file.

  • Approach 1: Keep open the file pointer at the start of the code and
    write it for every string. It makes it 20000*30 write operations.

  • Approach 2: Use bytes.Buffer Go and store everything in the buffer and
    write it at the end.Also in this case should the file pointer be
    opened from the beginning of the code or at the end of the code. Does
    it matter?

I am assuming approach 2 should work better. Can someone confirm this with a reason. How does writing at once be better than writing periodically. Because the file pointer will anyways be open.
I am using f.WriteString(&lt;string&gt;) and buffer.WriteString(&lt;some string&gt;) buffer is of type bytes.Buffer and f is the file pointer open.

答案1

得分: 8

bufio 包专门为这种任务而创建。与每次写入调用都进行系统调用相比,bufio.Writer 在内部内存中缓冲了固定数量的字节,然后才进行系统调用。在系统调用之后,内部缓冲区将被重用以处理下一部分数据。

与您的第二种方法 bufio.Writer 相比,

  • 进行更多的系统调用(N/S 而不是 1
  • 使用更少的内存(S 字节而不是 N 字节)

其中 S 是缓冲区大小(可以通过 bufio.NewWriterSize 指定),N 是需要写入的数据的总大小。

示例用法(https://play.golang.org/p/AvBE1d6wpT):

f, err := os.Create("file.txt")
if err != nil {
	log.Fatal(err)
}
defer f.Close()

w := bufio.NewWriter(f)
fmt.Fprint(w, "Hello, ")
fmt.Fprint(w, "world!")
err = w.Flush() // 别忘了刷新!
if err != nil {
	log.Fatal(err)
}
英文:

bufio package has been created exactly for this kind of task. Instead of making a syscall for each Write call bufio.Writer buffers up to a fixed number of bytes in the internal memory before making a syscall. After a syscall the internal buffer is reused for the next portion of data

Comparing to your second approach bufio.Writer

  • makes more syscalls (N/S instead of 1)
  • uses less memory (S bytes instead of N bytes)

where S - is buffer size (can be specified via bufio.NewWriterSize), N - total size of data that needs to be written.

Example usage (https://play.golang.org/p/AvBE1d6wpT):

f, err := os.Create(&quot;file.txt&quot;)
if err != nil {
	log.Fatal(err)
}
defer f.Close()

w := bufio.NewWriter(f)
fmt.Fprint(w, &quot;Hello, &quot;)
fmt.Fprint(w, &quot;world!&quot;)
err = w.Flush() // Don&#39;t forget to flush!
if err != nil {
	log.Fatal(err)
}

答案2

得分: 4

在写文件时,需要花费时间的操作是系统调用和磁盘I/O。文件指针打开并不会产生额外开销。因此,可以说第二种方法是最好的。

现在,你可能知道,操作系统并不直接将数据写入文件,而是使用内部的内存缓存来暂存待写入的文件,并在稍后进行真正的I/O操作。我不清楚具体的细节,一般情况下我也不需要知道。

我建议采用一个折中的解决方案:为每次循环迭代创建一个缓冲区,并将该缓冲区写入N次。这样可以大大减少系统调用和(可能的)磁盘写入次数,但不会消耗太多内存(这取决于字符串的大小,这可能是需要考虑的一个因素)。

我建议对这些解决方案进行基准测试,但由于系统的缓存机制,对磁盘I/O进行基准测试是非常困难的。

英文:

The operations that take time when writing in files are the syscalls and the disk I/O. The fact that the file pointer is open doesn't cost you anything. So naively, we could say that the second method is best.

Now, as you may know, you OS doesn't directly write into files, it uses an internal in-memory cache for files that are written and do the real I/O later. I don't know the exacts details of that, and generally speaking I don't need to.

What I would advise is a middle-ground solution: do a buffer for every loop iteration, and write this one N times. That way to cut a big part of the number of syscalls and (potentially) disk writes, but without consuming too much memory with the buffer (dependeing on the size of your strings, that my be a point to be taken into account).

I would suggest benchmarking for the best solution, but due to the caching done by the system, benchmarking disk I/O is a real nightmare.

答案3

得分: 1

系统调用并不廉价,所以第二种方法更好。

您可以使用lmbench工具中的lat_syscall来测量调用单个write函数所需的时间:

$ ./lat_syscall write
简单写入:0.1522 微秒

因此,在我的系统上,每个字符串调用write函数将额外花费大约 20000 * 0.15μs = 3ms 的时间。

英文:

Syscalls are not cheap, so the second approach is better.

You can use lat_syscall tool from lmbench to measure how long it takes to call single write:

$ ./lat_syscall write
Simple write: 0.1522 microseconds

So, on my system it will take approximately 20000 * 0.15μs = 3ms extra time just to call write for every string.

huangapple
  • 本文由 发表于 2016年1月6日 16:18:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/34628476.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定