比Python慢吗?

huangapple go评论87阅读模式
英文:

Go slower than Python?

问题

我有以下的Go代码:

package main

import (
    "fmt"
    "os"
    "bufio"
)

func main() {
    reader := bufio.NewReader(os.Stdin)
    scanner := bufio.NewScanner(reader)

    for scanner.Scan() {
        fmt.Println(scanner.Text())
    }
}

还有以下的Python代码:

import sys

for ln in sys.stdin:
    print ln,

两者都是从标准输入读取行,并打印到标准输出。在一个包含1600万行文本文件并输出到/dev/null的测试中,Python版本只需要Go版本的四分之一的时间。为什么会这样呢?

更新:根据JimB和siritinga的建议,我将Go版本的输出改为了缓冲版本。现在Go版本的速度快多了,但仍然比Python版本慢大约75%。

package main

import (
    "os"
    "bufio"
)

func main() {
    reader := bufio.NewReader(os.Stdin)
    scanner := bufio.NewScanner(reader)
    writer := bufio.NewWriter(os.Stdout)

    for scanner.Scan() {
        writer.WriteString(scanner.Text()+"\n")
    }
}
英文:

I have the following Go code:

package main

import ("fmt"
        "os"
        "bufio")

func main() {
    reader := bufio.NewReader(os.Stdin)
    scanner := bufio.NewScanner(reader)

    for scanner.Scan() {
        fmt.Println(scanner.Text())
    }
}

and the following Python code:

import sys

for ln in sys.stdin:
    print ln,

Both simply read lines from standard input and print to standard output. The Python version only uses 1/4 of the time the Go version needs (tested on a 16 million line text file and output to /dev/null). Why is that?

UPDATE: Following JimB and siritinga's advice, I changed Go's output to a buffered version. Now the Go version is much faster, but still about 75% slower than the Python version.

package main

import ("os"
        "bufio")

func main() {
    reader := bufio.NewReader(os.Stdin)
    scanner := bufio.NewScanner(reader)
    writer := bufio.NewWriter(os.Stdout)

    for scanner.Scan() {
        writer.WriteString(scanner.Text()+"\n")
    }
}

答案1

得分: 4

正如JimB所说,停止使用字符串。Python 2.x的字符串只是原始字节,而Go的字符串是UTF-8编码的。这就需要进行编码、错误检查等操作。另一方面,你还可以从字符串中获得更多的功能。此外,构建字符串需要额外的内存分配。

如果你将Python实现更改为使用Unicode字符串(升级到3.x版本或使用2.x版本的Unicode字符串实现),性能将会下降。如果你将Go版本更改为类似的编码方式,性能将会更好:

package main

import (
    "os"
    "bufio"
)

func main() {
    reader := bufio.NewReader(os.Stdin)
    scanner := bufio.NewScanner(reader)
    writer := bufio.NewWriter(os.Stdout)
    newline := []byte("\n")

    for scanner.Scan() {
        writer.Write(scanner.Bytes())
        writer.Write(newline)
    }
}

在我的系统上,使用包含6500万行的单词列表,Python的运行时间为:

real    0m12.724s
user    0m12.581s
sys     0m0.145s

而Go版本的运行时间为:

real    0m4.408s
user    0m4.276s
sys     0m0.135s

还应该注意的是,就性能比较而言,这不是一个好的案例。它并不能代表一个真实应用程序的处理方式。

英文:

As JimB said, stop using strings. Python 2.x strings are just raw bytes. Go strings are UTF-8. That requires encoding, checking for errors and so on. On the other hand, you also get more features out of strings. Also, building strings requires extra memory allocation.

If you change to unicode strings (upgrade to 3.x or unicode string implementation for 2.x) with your Python implementation the performance will tank. If you change to similar encoding with Go version, you will get much better performance:

package main

import ("os"
        "bufio")

func main() {
    reader := bufio.NewReader(os.Stdin)
    scanner := bufio.NewScanner(reader)
    writer := bufio.NewWriter(os.Stdout)
    newline := []byte("\n")

    for scanner.Scan() {
        writer.Write(scanner.Bytes())
        writer.Write(newline)
    }
}

On my system, using a word list with 65 million lines, Python:

real	0m12.724s
user	0m12.581s
sys	    0m0.145s

And the Go version:

real	0m4.408s
user	0m4.276s
sys	    0m0.135s

It should also be noted that as far as performance comparisons go this is not a good case. It does not represent what a real application would do, what would be to handle the data somehow.

huangapple
  • 本文由 发表于 2015年1月16日 00:06:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/27967765.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定