为什么在Go语言中读写文件比Perl慢得多?

huangapple go评论87阅读模式
英文:

Why reading and writing files in Go is much slower than in Perl?

问题

我正在使用Go来提高代码效率,但是当我使用Go来读写文件时,我发现它的读写效率不如Perl高。这是我的代码问题还是其他原因呢?

构建输入文件:

# 输入文件
for i in $(seq 1 600000) do     echo SERVER$((RANDOM%800+100)),$RANDOM,$RANDOM,$RANDOM >> sample.csv done

使用Perl读写文件:

time cat sample.csv | perl -ne 'chomp;print"$_"' > out.txt
real	0m0.249s
user	0m0.083s
sys	0m0.049s

使用Go读写文件:

package main

import (
	"bufio"
	"fmt"
	"io"
	"os"
	"strings"
)

func main() {

	filepath := "./sample.csv"
	file, err := os.OpenFile(filepath, os.O_RDWR, 0666)
	if err != nil {
		fmt.Println("打开文件错误", err)
		return
	}
	defer file.Close()
	buf := bufio.NewReader(file)
	for {
		line, err := buf.ReadString('\n')
		line = strings.TrimSpace(line)
		fmt.Println(line)
		if err != nil {
			if err == io.EOF {
				fmt.Println("文件读取完成")
				break
			} else {
				fmt.Println("读取文件错误", err)
				return
			}
		}
	}
}

然后我运行:

time go run read.go > out.txt
real	0m2.332s
user	0m0.326s
sys	0m2.038s

为什么Go中的读写操作几乎比Perl慢10倍?

英文:

I am using Go to improve code efficiency, but when I use Go to read and write files, I find that its reading and writing efficiency is not as high as that of Perl. Is it the problem of my code or other reasons?

Build input file:

# Input File:
for i in $(seq 1 600000) do     echo SERVER$((RANDOM%800+100)),$RANDOM,$RANDOM,$RANDOM >> sample.csv done

Read and write files with Perl:

time cat sample.csv | perl -ne 'chomp;print"$_"' > out.txt
real	0m0.249s
user	0m0.083s
sys	0m0.049s

Read and write files with Go:

package main

import (
	"bufio"
	"fmt"
	"io"
	"os"
	"strings"
)

func main() {

	filepath := "./sample.csv"
	file, err := os.OpenFile(filepath, os.O_RDWR, 0666)
	if err != nil {
		fmt.Println("Open file error!", err)
		return
	}
	defer file.Close()
	buf := bufio.NewReader(file)
	for {
		line, err := buf.ReadString('\n')
		line = strings.TrimSpace(line)
		fmt.Println(line)
		if err != nil {
			if err == io.EOF {
				fmt.Println("File read ok!")
				break
			} else {
				fmt.Println("Read file error!", err)
				return
			}
		}
	}
}

Then I run:

time go run read.go > out.txt
real	0m2.332s
user	0m0.326s
sys	0m2.038s

Why are read and write in Go almost 10 times slower than Perl?

答案1

得分: 15

你正在将苹果与橙子进行比较。

至少存在两个方法上的错误:

  1. 你的 Perl 代码测量了 cat 如何读取文件并将其内容通过 pipe(2) 发送,然后 perl 从中读取数据,处理并将结果写入其标准输出。

  2. 你的 Go 代码

    • 测量了完整的 Go 工具链构建过程(包括编译、链接和生成可执行文件)_然后_运行编译后的程序,并且
    • 测量了对标准输出的非缓冲写入(fmt.Print* 调用),而 Perl 代码中的写入是根据文档的说法,“如果输出是终端,则通常是行缓冲;否则是块缓冲。”

让我们试着进行公平的比较。

首先,这是一个可比较的 Go 实现:

package main

import (
	"bufio"
	"bytes"
	"fmt"
	"os"
)

func main() {
	in := bufio.NewScanner(os.Stdin)
	out := bufio.NewWriter(os.Stdout)

	for in.Scan() {
		s := bytes.TrimSpace(in.Bytes())

		if _, err := out.Write(s); err != nil {
			fmt.Fprint(os.Stderr, "failed to write file:", err)
			os.Exit(1)
		}
	}

	if err := out.Flush(); err != nil {
		fmt.Fprint(os.Stderr, "failed to write file:", err)
		os.Exit(1)
	}

	if err := in.Err(); err != nil {
		fmt.Fprint(os.Stderr, "reading failed:", err)
		os.Exit(1)
	}
}

将其保存为 chomp.go 并进行测量:

  1. 构建代码:

    $ go build chomp.go

  2. 生成输入文件:

    $ for i in $(seq 1 600000); do echo SERVER$((RANDOM%800+100)),$RANDOM,$RANDOM,$RANDOM; done > sample.csv

  3. 运行 Perl 代码:

    $ time { perl -ne 'chomp; print "$_";' < sample.csv > out1.txt; }
    
    real	0m0.226s
    user	0m0.102s
    sys	0m0.048s
    
  4. 再次运行以确保它从文件系统缓存中读取输入文件:

    $ time { perl -ne 'chomp; print "$_";' < sample.csv > out1.txt; }
    
    real	0m0.123s
    user	0m0.090s
    sys	0m0.033s
    

    注意执行时间的下降。

  5. 在缓存的输入上运行 Go 代码:

    $ time { ./chomp < sample.csv > out2.txt; }
    
    real	0m0.063s
    user	0m0.032s
    sys	0m0.032s
    
  6. 确保结果相同:

    $ cmp out1.txt out2.txt

正如你所看到的,在我的 linux/amd64 系统上,使用 SSD,结果差不多。

嗯,我还应该说明一点,为了得到合理的结果,你需要运行每个命令,比如运行 1000 次,并在每个批次中对结果进行平均,然后比较这些数字,但我认为这已足以说明你的方法存在的问题。

还有一件事需要考虑:这两个程序的运行时间主要由文件系统 I/O 支配,所以如果你认为 Go 在这方面会更快,那么你的期望是没有根据的:这两个程序大部分时间都在内核的系统调用 read(2)write(2)休眠。在某些情况下,Go 程序可能比 Perl 程序更快(特别是如果它被编写为利用多核系统),但你的示例显然不是这种情况。

哦,还有一件事需要明确的是:虽然 Go 语言规范没有规定 Go 实现的运行时系统必须如何实现,但两个现有的最先进的 Go 实现(其中一个显然是你正在使用的)都依赖于AOT,而 go run 是一个用于一次性临时任务的 hack,既不适用于严肃的工作,也不适用于执行任何复杂程度的代码。简单来说,你正在使用的 Go 并不是一种解释型语言,即使 go run 的可用性可能使其看起来像是解释型语言。实际上,它执行的是正常的 go build,然后运行生成的可执行文件,然后将其丢弃。


¹&nbsp;你可能会认为 Perl 也处理“源代码”,但 Perl 解释器经过高度优化以处理脚本,而 Go 的构建工具链——虽然与大多数其他编译语言相比非常快——并没有针对 那个 进行优化。更明显的区别可能是 Perl 解释器实际上 解释 你的(非常简单的)脚本,而 chompprint 是所谓的“内建函数”,由解释器直接提供给执行脚本。与之相比,构建 Go 程序涉及编译器解析源代码文件并将其转换为机器代码,链接器实际上读取 Go 标准库的编译包的文件——所有那些被 import 的包——从中提取代码片段,将所有这些机器代码片段组合起来并写出一个可执行的镜像文件(这与 perl 二进制文件本身非常相似);当然,这是一个非常消耗资源的过程,与实际程序执行无关。

英文:

You're comparing apples to oranges.

At least two methodological errors:

  1. Your Perl incantation measures how cat reads the file and sends its contents over pipe(2), and perl reads data from there, processes it and writes the results to its stdout.

  2. Your Go incantation

    • measures full build pass of the Go toolchain (which includes compilation, linking and writing out an executable image file) and then a run
      of the compiled program¹, and
    • measures unbuffered writes to stdout (fmt.Print* calls), while in the Perl code writes to the standard output - to cite the docs - 'typically can be line buffered if output is to the terminal and block buffered otherwise.'

Let's try to compare apples to apples.

First, here's a comparable Go implementation:

package main

import (
	&quot;bufio&quot;
	&quot;bytes&quot;
	&quot;fmt&quot;
	&quot;os&quot;
)

func main() {
	in := bufio.NewScanner(os.Stdin)
	out := bufio.NewWriter(os.Stdout)

	for in.Scan() {
		s := bytes.TrimSpace(in.Bytes())

		if _, err := out.Write(s); err != nil {
			fmt.Fprint(os.Stderr, &quot;failed to write file:&quot;, err)
			os.Exit(1)
		}
	}

	if err := out.Flush(); err != nil {
		fmt.Fprint(os.Stderr, &quot;failed to write file:&quot;, err)
		os.Exit(1)
	}

	if err := in.Err(); err != nil {
		fmt.Fprint(os.Stderr, &quot;reading failed:&quot;, err)
		os.Exit(1)
	}
}

Let's save it as chomp.go and measure:

  1. Build the code:

    $ go build chomp.go

  2. Generate the input file:

    $ for i in $(seq 1 600000); do echo
    SERVER$((RANDOM%800+100)),$RANDOM,$RANDOM,$RANDOM; done &gt;sample.csv

  3. Run the Perl code:

    $ time { perl -ne &#39;chomp; print &quot;$_&quot;;&#39; &lt;sample.csv &gt;out1.txt; }
    
    real	0m0.226s
    user	0m0.102s
    sys	0m0.048s
    
  4. Run it again to make sure it had read the input file from the filesystem cache:

    $ time { perl -ne &#39;chomp; print &quot;$_&quot;;&#39; &lt;sample.csv &gt;out1.txt; }
    
    real	0m0.123s
    user	0m0.090s
    sys	0m0.033s
    

    Note how the execution time has gone down.

  5. Run the Go code on the cached input:

    $ time { ./chomp &lt;sample.csv &gt;out2.txt; }
    
    real	0m0.063s
    user	0m0.032s
    sys	0m0.032s
    
  6. Make sure the results are the same:

    $ cmp out1.txt out2.txt

As you can see, on my linux/amd64 system with an SSD the results are in the same ballpark.

Well, I should also state that to get sensible results, you'd need to run each command, like, 1000 times and average the results in each batch, and compare those numbers instead, but I think it's enough to demonstrate what the problems with your approach were.

One more thing to consider: the run time of these two programs is overwhelmingly dominated by the filesystem I/O, so if you thought Go would be faster at that, your expectation was unfounded: both programs most of the time sleep in the kernel's system calls read(2) and write(2). A Go program might be faster than a Perl program in certain cases involving CPU crunching (especially if it's written to take advantage of multi-core systems), but your example is simply not that case.

Oh, and just to make the unstated fact explicit: while the Go language specification does not say anything on how the runtime system of a Go implementation must be done, both two existing state-of-the art Go implementations (one of which you're ostensibly using) rely on AOT, and go run is a hack for one-off throw-away gigs not intended neither for serious work nor for executing code of any serious level of complexity. In simpler words, Go-that-you-are-using is not an interpreted language even though the availability of go run might make it appear so. In fact it does what a normal go build would do then runs the resulting executable file then throws it away.


¹&nbsp;You might be tempted to state that Perl also deals with "the source code" but the Perl interpreter is highly optimized to deal with scripts, and the Go's build toolchain–while being crazy fast in comparison to most other compiled languages–is not optimized for that.
What is possibly a more glaring distinction, is that the Perl interpreter actually interprets your (very simple) script, and chomp and print are the so-called "built-ins"–functions, readily provided to the executing script by the interpreter. Compare that to building of a Go program which involves the compiler parsing the source code file and transforming it to the machine code, linker actually reading the files of the Go standard library's compiled packages—all those which are imported,–taking bits of code from them, combining all those bits of machine code and writing out an executable image file (which is much like what the perl binary itself is!); sure thing, this is a very resource-consuming process which has nothing to do with actual program execution.

huangapple
  • 本文由 发表于 2023年5月19日 16:49:21
  • 转载请务必保留本文链接:https://go.coder-hub.com/76287477.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定