英文:
Why reading and writing files in Go is much slower than in Perl?
问题
我正在使用Go来提高代码效率,但是当我使用Go来读写文件时,我发现它的读写效率不如Perl高。这是我的代码问题还是其他原因呢?
构建输入文件:
# 输入文件:
for i in $(seq 1 600000) do echo SERVER$((RANDOM%800+100)),$RANDOM,$RANDOM,$RANDOM >> sample.csv done
使用Perl读写文件:
time cat sample.csv | perl -ne 'chomp;print"$_"' > out.txt
real 0m0.249s
user 0m0.083s
sys 0m0.049s
使用Go读写文件:
package main
import (
"bufio"
"fmt"
"io"
"os"
"strings"
)
func main() {
filepath := "./sample.csv"
file, err := os.OpenFile(filepath, os.O_RDWR, 0666)
if err != nil {
fmt.Println("打开文件错误!", err)
return
}
defer file.Close()
buf := bufio.NewReader(file)
for {
line, err := buf.ReadString('\n')
line = strings.TrimSpace(line)
fmt.Println(line)
if err != nil {
if err == io.EOF {
fmt.Println("文件读取完成!")
break
} else {
fmt.Println("读取文件错误!", err)
return
}
}
}
}
然后我运行:
time go run read.go > out.txt
real 0m2.332s
user 0m0.326s
sys 0m2.038s
为什么Go中的读写操作几乎比Perl慢10倍?
英文:
I am using Go to improve code efficiency, but when I use Go to read and write files, I find that its reading and writing efficiency is not as high as that of Perl. Is it the problem of my code or other reasons?
Build input file:
# Input File:
for i in $(seq 1 600000) do echo SERVER$((RANDOM%800+100)),$RANDOM,$RANDOM,$RANDOM >> sample.csv done
Read and write files with Perl:
time cat sample.csv | perl -ne 'chomp;print"$_"' > out.txt
real 0m0.249s
user 0m0.083s
sys 0m0.049s
Read and write files with Go:
package main
import (
"bufio"
"fmt"
"io"
"os"
"strings"
)
func main() {
filepath := "./sample.csv"
file, err := os.OpenFile(filepath, os.O_RDWR, 0666)
if err != nil {
fmt.Println("Open file error!", err)
return
}
defer file.Close()
buf := bufio.NewReader(file)
for {
line, err := buf.ReadString('\n')
line = strings.TrimSpace(line)
fmt.Println(line)
if err != nil {
if err == io.EOF {
fmt.Println("File read ok!")
break
} else {
fmt.Println("Read file error!", err)
return
}
}
}
}
Then I run:
time go run read.go > out.txt
real 0m2.332s
user 0m0.326s
sys 0m2.038s
Why are read and write in Go almost 10 times slower than Perl?
答案1
得分: 15
你正在将苹果与橙子进行比较。
至少存在两个方法上的错误:
-
你的 Perl 代码测量了
cat
如何读取文件并将其内容通过pipe(2)
发送,然后perl
从中读取数据,处理并将结果写入其标准输出。 -
你的 Go 代码
- 测量了完整的 Go 工具链构建过程(包括编译、链接和生成可执行文件)_然后_运行编译后的程序,并且
- 测量了对标准输出的非缓冲写入(
fmt.Print*
调用),而 Perl 代码中的写入是根据文档的说法,“如果输出是终端,则通常是行缓冲;否则是块缓冲。”
让我们试着进行公平的比较。
首先,这是一个可比较的 Go 实现:
package main
import (
"bufio"
"bytes"
"fmt"
"os"
)
func main() {
in := bufio.NewScanner(os.Stdin)
out := bufio.NewWriter(os.Stdout)
for in.Scan() {
s := bytes.TrimSpace(in.Bytes())
if _, err := out.Write(s); err != nil {
fmt.Fprint(os.Stderr, "failed to write file:", err)
os.Exit(1)
}
}
if err := out.Flush(); err != nil {
fmt.Fprint(os.Stderr, "failed to write file:", err)
os.Exit(1)
}
if err := in.Err(); err != nil {
fmt.Fprint(os.Stderr, "reading failed:", err)
os.Exit(1)
}
}
将其保存为 chomp.go
并进行测量:
-
构建代码:
$ go build chomp.go
-
生成输入文件:
$ for i in $(seq 1 600000); do echo SERVER$((RANDOM%800+100)),$RANDOM,$RANDOM,$RANDOM; done > sample.csv
-
运行 Perl 代码:
$ time { perl -ne 'chomp; print "$_";' < sample.csv > out1.txt; } real 0m0.226s user 0m0.102s sys 0m0.048s
-
再次运行以确保它从文件系统缓存中读取输入文件:
$ time { perl -ne 'chomp; print "$_";' < sample.csv > out1.txt; } real 0m0.123s user 0m0.090s sys 0m0.033s
注意执行时间的下降。
-
在缓存的输入上运行 Go 代码:
$ time { ./chomp < sample.csv > out2.txt; } real 0m0.063s user 0m0.032s sys 0m0.032s
-
确保结果相同:
$ cmp out1.txt out2.txt
正如你所看到的,在我的 linux/amd64
系统上,使用 SSD,结果差不多。
嗯,我还应该说明一点,为了得到合理的结果,你需要运行每个命令,比如运行 1000 次,并在每个批次中对结果进行平均,然后比较这些数字,但我认为这已足以说明你的方法存在的问题。
还有一件事需要考虑:这两个程序的运行时间主要由文件系统 I/O 支配,所以如果你认为 Go 在这方面会更快,那么你的期望是没有根据的:这两个程序大部分时间都在内核的系统调用 read(2)
和 write(2)
中 休眠。在某些情况下,Go 程序可能比 Perl 程序更快(特别是如果它被编写为利用多核系统),但你的示例显然不是这种情况。
哦,还有一件事需要明确的是:虽然 Go 语言规范没有规定 Go 实现的运行时系统必须如何实现,但两个现有的最先进的 Go 实现(其中一个显然是你正在使用的)都依赖于AOT,而 go run
是一个用于一次性临时任务的 hack,既不适用于严肃的工作,也不适用于执行任何复杂程度的代码。简单来说,你正在使用的 Go 并不是一种解释型语言,即使 go run
的可用性可能使其看起来像是解释型语言。实际上,它执行的是正常的 go build
,然后运行生成的可执行文件,然后将其丢弃。
¹ 你可能会认为 Perl 也处理“源代码”,但 Perl 解释器经过高度优化以处理脚本,而 Go 的构建工具链——虽然与大多数其他编译语言相比非常快——并没有针对 那个 进行优化。更明显的区别可能是 Perl 解释器实际上 解释 你的(非常简单的)脚本,而 chomp
和 print
是所谓的“内建函数”,由解释器直接提供给执行脚本。与之相比,构建 Go 程序涉及编译器解析源代码文件并将其转换为机器代码,链接器实际上读取 Go 标准库的编译包的文件——所有那些被 import
的包——从中提取代码片段,将所有这些机器代码片段组合起来并写出一个可执行的镜像文件(这与 perl
二进制文件本身非常相似);当然,这是一个非常消耗资源的过程,与实际程序执行无关。
英文:
You're comparing apples to oranges.
At least two methodological errors:
-
Your Perl incantation measures how
cat
reads the file and sends its contents overpipe(2)
, andperl
reads data from there, processes it and writes the results to its stdout. -
Your Go incantation
- measures full build pass of the Go toolchain (which includes compilation, linking and writing out an executable image file) and then a run
of the compiled program¹, and - measures unbuffered writes to stdout (
fmt.Print*
calls), while in the Perl code writes to the standard output - to cite the docs - 'typically can be line buffered if output is to the terminal and block buffered otherwise.'
- measures full build pass of the Go toolchain (which includes compilation, linking and writing out an executable image file) and then a run
Let's try to compare apples to apples.
First, here's a comparable Go implementation:
package main
import (
"bufio"
"bytes"
"fmt"
"os"
)
func main() {
in := bufio.NewScanner(os.Stdin)
out := bufio.NewWriter(os.Stdout)
for in.Scan() {
s := bytes.TrimSpace(in.Bytes())
if _, err := out.Write(s); err != nil {
fmt.Fprint(os.Stderr, "failed to write file:", err)
os.Exit(1)
}
}
if err := out.Flush(); err != nil {
fmt.Fprint(os.Stderr, "failed to write file:", err)
os.Exit(1)
}
if err := in.Err(); err != nil {
fmt.Fprint(os.Stderr, "reading failed:", err)
os.Exit(1)
}
}
Let's save it as chomp.go
and measure:
-
Build the code:
$ go build chomp.go
-
Generate the input file:
$ for i in $(seq 1 600000); do echo
SERVER$((RANDOM%800+100)),$RANDOM,$RANDOM,$RANDOM; done >sample.csv -
Run the Perl code:
$ time { perl -ne 'chomp; print "$_";' <sample.csv >out1.txt; } real 0m0.226s user 0m0.102s sys 0m0.048s
-
Run it again to make sure it had read the input file from the filesystem cache:
$ time { perl -ne 'chomp; print "$_";' <sample.csv >out1.txt; } real 0m0.123s user 0m0.090s sys 0m0.033s
Note how the execution time has gone down.
-
Run the Go code on the cached input:
$ time { ./chomp <sample.csv >out2.txt; } real 0m0.063s user 0m0.032s sys 0m0.032s
-
Make sure the results are the same:
$ cmp out1.txt out2.txt
As you can see, on my linux/amd64
system with an SSD the results are in the same ballpark.
Well, I should also state that to get sensible results, you'd need to run each command, like, 1000 times and average the results in each batch, and compare those numbers instead, but I think it's enough to demonstrate what the problems with your approach were.
One more thing to consider: the run time of these two programs is overwhelmingly dominated by the filesystem I/O, so if you thought Go would be faster at that, your expectation was unfounded: both programs most of the time sleep in the kernel's system calls read(2)
and write(2)
. A Go program might be faster than a Perl program in certain cases involving CPU crunching (especially if it's written to take advantage of multi-core systems), but your example is simply not that case.
Oh, and just to make the unstated fact explicit: while the Go language specification does not say anything on how the runtime system of a Go implementation must be done, both two existing state-of-the art Go implementations (one of which you're ostensibly using) rely on AOT, and go run
is a hack for one-off throw-away gigs not intended neither for serious work nor for executing code of any serious level of complexity. In simpler words, Go-that-you-are-using is not an interpreted language even though the availability of go run
might make it appear so. In fact it does what a normal go build
would do then runs the resulting executable file then throws it away.
¹ You might be tempted to state that Perl also deals with "the source code" but the Perl interpreter is highly optimized to deal with scripts, and the Go's build toolchain–while being crazy fast in comparison to most other compiled languages–is not optimized for that.
What is possibly a more glaring distinction, is that the Perl interpreter actually interprets your (very simple) script, and chomp
and print
are the so-called "built-ins"–functions, readily provided to the executing script by the interpreter. Compare that to building of a Go program which involves the compiler parsing the source code file and transforming it to the machine code, linker actually reading the files of the Go standard library's compiled packages—all those which are import
ed,–taking bits of code from them, combining all those bits of machine code and writing out an executable image file (which is much like what the perl
binary itself is!); sure thing, this is a very resource-consuming process which has nothing to do with actual program execution.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论