英文:
Go slower than Python?
问题
我有以下的Go代码:
package main
import (
"fmt"
"os"
"bufio"
)
func main() {
reader := bufio.NewReader(os.Stdin)
scanner := bufio.NewScanner(reader)
for scanner.Scan() {
fmt.Println(scanner.Text())
}
}
还有以下的Python代码:
import sys
for ln in sys.stdin:
print ln,
两者都是从标准输入读取行,并打印到标准输出。在一个包含1600万行文本文件并输出到/dev/null的测试中,Python版本只需要Go版本的四分之一的时间。为什么会这样呢?
更新:根据JimB和siritinga的建议,我将Go版本的输出改为了缓冲版本。现在Go版本的速度快多了,但仍然比Python版本慢大约75%。
package main
import (
"os"
"bufio"
)
func main() {
reader := bufio.NewReader(os.Stdin)
scanner := bufio.NewScanner(reader)
writer := bufio.NewWriter(os.Stdout)
for scanner.Scan() {
writer.WriteString(scanner.Text()+"\n")
}
}
英文:
I have the following Go code:
package main
import ("fmt"
"os"
"bufio")
func main() {
reader := bufio.NewReader(os.Stdin)
scanner := bufio.NewScanner(reader)
for scanner.Scan() {
fmt.Println(scanner.Text())
}
}
and the following Python code:
import sys
for ln in sys.stdin:
print ln,
Both simply read lines from standard input and print to standard output. The Python version only uses 1/4 of the time the Go version needs (tested on a 16 million line text file and output to /dev/null). Why is that?
UPDATE: Following JimB and siritinga's advice, I changed Go's output to a buffered version. Now the Go version is much faster, but still about 75% slower than the Python version.
package main
import ("os"
"bufio")
func main() {
reader := bufio.NewReader(os.Stdin)
scanner := bufio.NewScanner(reader)
writer := bufio.NewWriter(os.Stdout)
for scanner.Scan() {
writer.WriteString(scanner.Text()+"\n")
}
}
答案1
得分: 4
正如JimB所说,停止使用字符串。Python 2.x的字符串只是原始字节,而Go的字符串是UTF-8编码的。这就需要进行编码、错误检查等操作。另一方面,你还可以从字符串中获得更多的功能。此外,构建字符串需要额外的内存分配。
如果你将Python实现更改为使用Unicode字符串(升级到3.x版本或使用2.x版本的Unicode字符串实现),性能将会下降。如果你将Go版本更改为类似的编码方式,性能将会更好:
package main
import (
"os"
"bufio"
)
func main() {
reader := bufio.NewReader(os.Stdin)
scanner := bufio.NewScanner(reader)
writer := bufio.NewWriter(os.Stdout)
newline := []byte("\n")
for scanner.Scan() {
writer.Write(scanner.Bytes())
writer.Write(newline)
}
}
在我的系统上,使用包含6500万行的单词列表,Python的运行时间为:
real 0m12.724s
user 0m12.581s
sys 0m0.145s
而Go版本的运行时间为:
real 0m4.408s
user 0m4.276s
sys 0m0.135s
还应该注意的是,就性能比较而言,这不是一个好的案例。它并不能代表一个真实应用程序的处理方式。
英文:
As JimB said, stop using strings. Python 2.x strings are just raw bytes. Go strings are UTF-8. That requires encoding, checking for errors and so on. On the other hand, you also get more features out of strings. Also, building strings requires extra memory allocation.
If you change to unicode strings (upgrade to 3.x or unicode string implementation for 2.x) with your Python implementation the performance will tank. If you change to similar encoding with Go version, you will get much better performance:
package main
import ("os"
"bufio")
func main() {
reader := bufio.NewReader(os.Stdin)
scanner := bufio.NewScanner(reader)
writer := bufio.NewWriter(os.Stdout)
newline := []byte("\n")
for scanner.Scan() {
writer.Write(scanner.Bytes())
writer.Write(newline)
}
}
On my system, using a word list with 65 million lines, Python:
real 0m12.724s
user 0m12.581s
sys 0m0.145s
And the Go version:
real 0m4.408s
user 0m4.276s
sys 0m0.135s
It should also be noted that as far as performance comparisons go this is not a good case. It does not represent what a real application would do, what would be to handle the data somehow.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论