2012年9月25日 09:00:20go评论104阅读模式

英文:

Can Go really be that much faster than Python?

问题

我认为我可能实现得不正确，因为结果没有意义。我有一个Go程序，它计数到1000000000：

package main

import (
    "fmt"
)

func main() {
    for i := 0; i < 1000000000; i++ {}
    fmt.Println("Done") 
}

它在不到一秒的时间内完成。另一方面，我有一个Python脚本：

x = 0
while x < 1000000000:
    x+=1
print 'Done'

它需要几分钟才能完成。

为什么Go版本如此快？它们都在计数到1000000000，还是我漏掉了什么？

英文:

I think I may have implemented this incorrectly because the results do not make sense. I have a Go program that counts to 1000000000:

package main

import (
    &quot;fmt&quot;
)

func main() {
    for i := 0; i &lt; 1000000000; i++ {}
    fmt.Println(&quot;Done&quot;) 
}

It finishes in less than a second. On the other hand I have a Python script:

x = 0
while x &lt; 1000000000:
    x+=1
print &#39;Done&#39;

It finishes in a few minutes.

Why is the Go version so much faster? Are they both counting up to 1000000000 or am I missing something?

答案1

得分: 93

十亿并不是一个非常大的数字。任何一个相当现代的机器，只要它能够使用本机类型进行工作，最多几秒钟就能完成这个任务。我通过编写一个等效的C程序来验证这一点，读取汇编代码以确保它确实在进行加法运算，并计时（在我的机器上大约需要1.8秒完成）。

然而，Python没有本机类型变量的概念（或者根本没有有意义的类型注释），所以在这种情况下它需要进行数百倍的工作量。简而言之，对于你的标题问题的答案是“是的”。Go确实比Python快得多，即使没有任何编译器的技巧，比如优化掉一个没有副作用的循环。

英文:

One billion is not a very big number. Any reasonably modern machine should be able to do this in a few seconds at most, if it's able to do the work with native types. I verified this by writing an equivalent C program, reading the assembly to make sure that it actually was doing addition, and timing it (it completes in about 1.8 seconds on my machine).

Python, however, doesn't have a concept of natively typed variables (or meaningful type annotations at all), so it has to do hundreds of times as much work in this case. In short, the answer to your headline question is "yes". Go really can be that much faster than Python, even without any kind of compiler trickery like optimizing away a side-effect-free loop.

答案2

得分: 76

pypy实际上在加速这个循环方面做得很出色

def main():
    x = 0
    while x < 1000000000:
        x+=1

if __name__ == "__main__":
    s=time.time()
    main()
    print time.time() - s

$ python count.py 
44.221405983
$ pypy count.py 
1.03511095047

速度提升了约97%！

对于3个没有“理解”的人的澄清。Python语言本身并不慢。CPython实现是一种相对直接的运行代码的方式。Pypy是另一种语言实现，它做了许多巧妙的（特别是JIT）事情，可以产生巨大的差异。直接回答标题中的问题 - Go并不比_Python_“快那么多”，Go比_CPython_快那么多。

话虽如此，这些代码示例并不真正做同样的事情。Python需要实例化1000000000个int对象。Go只是在一个内存位置上递增。

英文:

pypy actually does an impressive job of speeding up this loop

def main():
    x = 0
    while x &lt; 1000000000:
        x+=1

if __name__ == &quot;__main__&quot;:
    s=time.time()
    main()
    print time.time() - s

$ python count.py 
44.221405983
$ pypy count.py 
1.03511095047

~97% speedup!

Clarification for 3 people who didn't "get it". The Python language itself isn't slow. The CPython implementation is a relatively straight forward way of running the code. Pypy is another implementation of the language that does many tricky (especiallt the JIT) things that can make enormous differences. Directly answering the question in the title - Go isn't "that much" faster than Python, Go is that much faster than CPython.

Having said that, the code samples aren't really doing the same thing. Python needs to instantiate 1000000000 of its int objects. Go is just incrementing one memory location.

答案3

得分: 25

这种情况将非常有利于良好的本地编译的静态类型语言。本地编译的静态类型语言能够发出一个非常简单的循环，比如说4-6个CPU操作码，它利用简单的检查条件来终止。这个循环几乎没有分支预测错误，并且可以有效地被认为是每个CPU周期执行一次递增操作（这并不完全正确，但是...）。

Python的实现需要做更多的工作，主要是由于动态类型。Python必须进行几个不同的调用（内部和外部），只是为了将两个int相加。在Python中，它必须调用__add__（实际上是i = i.__add__(1)，但这种语法只适用于Python 3.x），然后它必须检查传递的值的类型（以确保它是一个int），然后它将整数值相加（从两个对象中提取它们），然后新的整数值再次被封装在一个新对象中。最后，它将新对象重新分配给局部变量。这比一个单一的递增操作码要做的工作要多得多，甚至没有涉及到循环本身-相比之下，Go/本地版本可能只是通过副作用递增一个寄存器。

在这样一个简单的基准测试中，Java的表现会好得多，并且很可能与Go非常接近；JIT和计数器变量的静态类型可以确保这一点（它使用了一种特殊的整数加法JVM指令）。再次强调，Python没有这样的优势。现在，有一些实现，比如PyPy/RPython，它运行一个静态类型的阶段，在这里的表现应该比CPython好得多。

英文:

This scenario will highly favor decent natively-compiled statically-typed languages. Natively compiled statically-typed languages are capable of emitting a very trivial loop of say, 4-6 CPU opcodes that utilizes simple check-condition for termination. This loop has effectively zero branch prediction misses and can be effectively thought of as performing an increment every CPU cycle (this isn't entirely true, but..)

Python implementations have to do significantly more work, primarily due to the dynamic typing. Python must make several different calls (internal and external) just to add two ints together. In Python it must call __add__ (it is effectively i = i.__add__(1), but this syntax will only work in Python 3.x), which in turn has to check the type of the value passed (to make sure it is an int), then it adds the integer values (extracting them from both of the objects), and then the new integer value is wrapped up again in a new object. Finally it re-assigns the new object to the local variable. That's significantly more work than a single opcode to increment, and doesn't even address the loop itself - by comparison, the Go/native version is likely only incrementing a register by side-effect.

Java will fair much better in a trivial benchmark like this and will likely be fairly close to Go; the JIT and static-typing of the counter variable can ensure this (it uses a special integer add JVM instruction). Once again, Python has no such advantage. Now, there are some implementations like PyPy/RPython, which run a static-typing phase and should fare much better than CPython here ..

答案4

得分: 10

你在这里有两个工作。首先，Go语言被编译为机器码并直接在CPU上运行，而Python被编译为字节码并在一个（特别慢的）虚拟机上运行。

其次，对性能产生更大影响的是两个程序的语义实际上有很大的不同。Go版本创建了一个名为“x”的“盒子”，它保存一个数字，并在每次程序运行时将其增加1。Python版本实际上必须在每个循环中创建一个新的“盒子”（int对象）（最终必须将它们丢弃）。我们可以通过稍微修改你的程序来证明这一点：

package main

import (
    "fmt"
)

func main() {
    for i := 0; i < 10; i++ {
        fmt.Printf("%d %p\n", i, &i)
    }
}

...和：

x = 0;
while x < 10:
    x += 1
    print x, id(x)

这是因为Go语言由于其C语言的根源，将变量名视为位置，而Python将变量名视为对象。由于在Python中整数被视为唯一的、不可变的实体，我们必须不断创建新的整数对象。Python应该比Go慢，但你选择了最坏的情况 - 在基准测试游戏中，我们看到Go平均快约25倍（最坏情况下快100倍）。

你可能已经读到，如果你的Python程序太慢，你可以通过将一些东西移到C中来加速它们。幸运的是，在这种情况下，有人已经为你做了这个。如果你将空循环重写为使用xrange()，像这样：

for x in xrange(1000000000):
    pass
print "Done."

...你会发现它运行速度大约快两倍。如果你发现循环计数器实际上是你的程序的一个主要瓶颈，那么可能是时候研究一种新的解决问题的方法了。

英文:

You've got two things at work here. The first of which is that Go is compiled to machine code and run directly on the CPU while Python is compiled to bytecode run against a (particularly slow) VM.

The second, and more significant, thing impacting performance is that the semantics of the two programs are actually significantly different. The Go version makes a "box" called "x" that holds a number and increments that by 1 on each pass through the program. The Python version actually has to create a new "box" (int object) on each cycle (and, eventually, has to throw them away). We can demonstrate this by modifying your programs slightly:

package main

import (
    &quot;fmt&quot;
)

func main() {
    for i := 0; i &lt; 10; i++ {
        fmt.Printf(&quot;%d %p\n&quot;, i, &amp;i)
    }
}

...and:

x = 0;
while x &lt; 10:
    x += 1
    print x, id(x)

This is because Go, due to it's C roots, takes a variable name to refer to a place, where Python takes variable names to refer to things. Since an integer is considered a unique, immutable entity in python, we must constantly make new ones. Python should be slower than Go but you've picked a worst-case scenario - in the Benchmarks Game, we see go being, on average, about 25x times faster (100x in the worst case).

You've probably read that, if your Python programs are too slow, you can speed them up by moving things into C. Fortunately, in this case, somebody's already done this for you. If you rewrite your empty loop to use xrange() like so:

for x in xrange(1000000000):
    pass
print &quot;Done.&quot;

...you'll see it run about twice as fast. If you find loop counters to actually be a major bottleneck in your program, it might be time to investigate a new way of solving the problem.

答案5

得分: 4

我有点晚来参加派对，但我会说答案是肯定的和否定的。正如@gnibbler指出的，CPython在简单实现上较慢，但pypy是即时编译的，可以得到更快的代码。

如果你在CPython中进行数值处理，大多数人会使用numpy，在数组和矩阵上进行快速操作。最近我一直在使用numba，它允许你在代码中添加一个简单的包装器。对于这个例子，我只是在函数incALot()上添加了@njit。

在我的机器上，CPython花费了61秒，但使用numba包装器只需要7.2微秒，这将类似于C，甚至可能比Go更快。这是一个800万倍的加速。

因此，在Python中，如果数字处理似乎有点慢，有工具可以解决这个问题-而且你仍然可以获得Python的程序员生产力和REPL。

def incALot(y):
    x = 0
    while x < y:
        x += 1

@njit('i8(i8)')
def nbIncALot(y):
    x = 0
    while x < y:
        x += 1
    return x

size = 1000000000
start = time.time()
incALot(size)
t1 = time.time() - start
start = time.time()
x = nbIncALot(size)
t2 = time.time() - start
print('CPython3 takes %.3fs, Numba takes %.9fs' %(t1, t2))
print('Speedup is: %.1f' % (t1/t2))
print('Just Checking:', x)

输出结果：
CPython3 takes 58.958s, Numba takes 0.000007153s
Speedup is: 8242982.2
Just Checking: 1000000000

英文:

@troq

I'm a little late to the party but I'd say the answer is yes and no. As @gnibbler pointed out, CPython is slower in the simple implementation but pypy is jit compiled for much faster code when you need it.

If you're doing numeric processing with CPython most will do it with numpy resulting in fast operations on arrays and matrices. Recently I've been doing a lot with numba which allows you to add a simple wrapper to your code. For this one I just added @njit to a function incALot() which runs your code above.

On my machine CPython takes 61 seconds, but with the numba wrapper it takes 7.2 microseconds which will be similar to C and maybe faster than Go. Thats an 8 million times speedup.

So, in Python, if things with numbers seem a bit slow, there are tools to address it - and you still get Python's programmer productivity and the REPL.

def incALot(y):
    x = 0
    while x &lt; y:
        x += 1

@njit(&#39;i8(i8)&#39;)
def nbIncALot(y):
    x = 0
    while x &lt; y:
        x += 1
    return x

size = 1000000000
start = time.time()
incALot(size)
t1 = time.time() - start
start = time.time()
x = nbIncALot(size)
t2 = time.time() - start
print(&#39;CPython3 takes %.3fs, Numba takes %.9fs&#39; %(t1, t2))
print(&#39;Speedup is: %.1f&#39; % (t1/t2))
print(&#39;Just Checking:&#39;, x)

CPython3 takes 58.958s, Numba takes 0.000007153s
Speedup is: 8242982.2
Just Checking: 1000000000

答案6

得分: 0

问题是Python是解释性的，而GO不是，所以没有真正的方法来进行速度测试。解释性语言通常（并不总是）有一个虚拟机组件，这就是问题所在，你运行的任何测试都是在解释性边界而不是实际运行时边界上运行的。GO在速度上稍慢于C，这主要是因为它使用垃圾回收而不是手动内存管理。话虽如此，与Python相比，GO很快，因为它是一种编译语言，GO唯一缺乏的是错误测试，如果我错了，请纠正我。

英文:

Problem is Python is interpreted, GO isn't so there's no real way to bench test speeds. Interpreted languages usually (not always have a vm component) that's where the problem lies, any test you run is being run in interpreted bounds not actual runtime bounds. Go is slightly slower than C in terms of speed and that is mostly due to it using garbage collection instead of manual memory management. That said GO compared to Python is fast because its a compiled language, the only thing lacking in GO is bug testing I stand corrected if I'm wrong.

答案7

得分: -1

可能编译器意识到你在循环之后没有使用变量"i"，所以它通过移除循环来优化最终的代码。

即使你在之后使用了它，编译器可能足够聪明，会用以下代码替代循环：

i = 1000000000;

希望这能帮到你 =)

英文:

It is possible that the compiler realized that you didn't use the "i" variable after the loop, so it optimized the final code by removing the loop.

Even if you used it afterwards, the compiler is probably smart enough to substitute the loop with

i = 1000000000;

Hope this helps =)

答案8

得分: -6

我不熟悉go语言，但我猜测go版本忽略了循环，因为循环体没有执行任何操作。另一方面，在python版本中，你在循环体中递增了x，所以循环可能实际上是在执行。

英文:

I'm not familiar with go, but I'd guess that go version ignores the loop since the body of the loop does nothing. On the other hand, in the python version, you are incrementing x in the body of the loop so it's probably actually executing the loop.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Go真的比Python快那么多吗？

问题

答案1

答案2

答案3

答案4

答案5

答案6

答案7

答案8

What does a const value do in an anonymous struct field?

PySpark 迭代行并删除具有指定值的行。

如何在golang中解析ISO6709坐标？

发送WhatsApp消息时出现错误，使用pywhatkit。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论