2014年10月2日 05:48:08go评论102阅读模式

英文:

Why is this simple loop faster in Go than in C?

问题

我正在为你翻译以下内容：

我试图找出Go语言的循环性能是否和C语言一样好，但令人惊讶的是，在我的简单测试中，C语言版本的运行时间是Go语言版本的两倍。

C语言版本：

#include <stdio.h>

int main() {
  int i = 0, a = 0;

  while (i < 1e9) {
    a = (a + i) % 42;
    i = i + 1;
  }
  printf("%d\n", a);
}

$ gcc -o main main.c && time ./main # tried -O0 as well; the result is similar
36
./main  10.53s user 0.08s system 98% cpu 10.769 total

Go语言版本：

package main

import "fmt"

func main() {
    a := int32(0)
    for i := int32(0); i < 1e9; i++ {
        a = (a + i) % 42
    }
    fmt.Println(a)
}

$ time go run main.go
36
colorgo run main.go  5.27s user 0.14s system 93% cpu 5.816 total

（在Darwin，amd64上进行测试）

对于这个简单的算法，两者不应该产生几乎相同的机器代码吗？这是由于编译器优化还是缓存效率的原因？

请帮助我理解！谢谢！

英文:

I was trying to find out whether Go's loop performance is as good as C's, but surprisingly found that for my simple test, C version takes twice the time of Go version.

C Version:

#include &lt;stdio.h&gt;

int main() {
  int i = 0, a = 0;

  while (i &lt; 1e9) {
    a = (a + i) % 42;
    i = i + 1;
  }
  printf(&quot;%d\n&quot;, a);
}

$ gcc -o main main.c &amp;&amp; time ./main # tried -O0 as well; the result is similar
36
./main  10.53s user 0.08s system 98% cpu 10.769 total

Go Version:

package main

import &quot;fmt&quot;

func main() {
	a := int32(0)
	for i := int32(0); i &lt; 1e9; i++ {
		a = (a + i) % 42
    }
	fmt.Println(a)
}

$ time go run main.go
36
colorgo run main.go  5.27s user 0.14s system 93% cpu 5.816 total

(tested on Darwin, amd64)

For this simple algorithm, shouldn't both of them produce nearly identical machine code? Is this due to compiler optimization? Cache efficiency?

Please help me understand! Thanks!

答案1

得分: 3

这一切归结为生成的汇编代码。

go tool 6g -S（21条指令）：

MOVL    $0,SI
MOVL    SI,&quot;&quot;.a+8(FP)
MOVL    $0,CX
CMPL    CX,$1000000000
JGE     $0,58
ADDL    CX,SI
MOVL    $818089009,BP
MOVL    SI,AX
IMULL   BP,
MOVL    DX,BX
SARL    $3,BX
MOVL    SI,BP
SARL    $31,BP
SUBL    BP,BX
IMULL   $42,BX
SUBL    BX,SI
MOVL    SI,&quot;&quot;.a+8(FP)
INCL    ,CX #point A
NOP     ,
CMPL    CX,$1000000000
JLT     $0,16
RET     ,

gcc -O3 -march=native -S（17条指令）：

leal    (%rsi,%rcx), %edi
addl    $1, %ecx
vxorpd  %xmm0, %xmm0, %xmm0
vcvtsi2sd       %ecx, %xmm0, %xmm0
movl    %edi, %eax
imull   %r8d
movl    %edi, %eax
sarl    $31, %eax
sarl    $3, %edx
movl    %edx, %esi
subl    %eax, %esi
imull   $42, %esi, %esi
subl    %esi, %edi
vucomisd        %xmm0, %xmm1
movl    %edi, %esi
ja      .L2
subq    $8, %rsp

gcc -O3 -march=native -S（14条指令，将1e9替换为1000000000后）：

leal    (%rdx,%rcx), %esi
addl    $1, %ecx
movl    %esi, %eax
imull   %edi
movl    %esi, %eax
sarl    $31, %eax
sarl    $3, %edx
subl    %eax, %edx
imull   $42, %edx, %edx
subl    %edx, %esi
movl    %esi, %edx
cmpl    $1000000000, %ecx
jne     .L2
subq    $8, %rsp

计时：

$ gcc -O3 -march=native loop.c; and time ./a.out
36
2.92user 0.00system 0:02.93elapsed 99%CPU
$ go build -o loop loop.go; and time ./loop
36
2.89user 0.00system 0:02.90elapsed 99%CPU
$ gcc -O3 -march=native loop_nofp.c; and time ./a.out
36
2.92user 0.00system 0:02.94elapsed 99%CPU (0avgtext+0avgdata 1312maxresident)

我不知道，我暂时不会回答这个问题，直到有一个正确的答案发布为止。

//编辑

将C代码更改为使用for循环以匹配Go版本，生成了不同的汇编代码，但时间完全相同。

int main() {
	int32_t i = 0, a = 0;
	for (i = 0; i &lt; 1e9; i++) {
		a = (a + i) % 42;
	}
	printf(&quot;%d\n&quot;, a);
	return 0;
}

英文:

It all boils down to the assembly generated.

go tool 6g -S (21 instructions):

MOVL    $0,SI
MOVL    SI,&quot;&quot;.a+8(FP)
MOVL    $0,CX
CMPL    CX,$1000000000
JGE     $0,58
ADDL    CX,SI
MOVL    $818089009,BP
MOVL    SI,AX
IMULL   BP,
MOVL    DX,BX
SARL    $3,BX
MOVL    SI,BP
SARL    $31,BP
SUBL    BP,BX
IMULL   $42,BX
SUBL    BX,SI
MOVL    SI,&quot;&quot;.a+8(FP)
INCL    ,CX #point A
NOP     ,
CMPL    CX,$1000000000
JLT     $0,16
RET     ,

gcc -O3 -march=native -S (17 instructions):

leal    (%rsi,%rcx), %edi
addl    $1, %ecx
vxorpd  %xmm0, %xmm0, %xmm0
vcvtsi2sd       %ecx, %xmm0, %xmm0
movl    %edi, %eax
imull   %r8d
movl    %edi, %eax
sarl    $31, %eax
sarl    $3, %edx
movl    %edx, %esi
subl    %eax, %esi
imull   $42, %esi, %esi
subl    %esi, %edi
vucomisd        %xmm0, %xmm1
movl    %edi, %esi
ja      .L2
subq    $8, %rsp

gcc -O3 -march=native -S (14 instructions, after replacing 1e9 with 1000000000):

leal    (%rdx,%rcx), %esi
addl    $1, %ecx
movl    %esi, %eax
imull   %edi
movl    %esi, %eax
sarl    $31, %eax
sarl    $3, %edx
subl    %eax, %edx
imull   $42, %edx, %edx
subl    %edx, %esi
movl    %esi, %edx
cmpl    $1000000000, %ecx
jne     .L2
subq    $8, %rsp

Timing:

$ gcc -O3 -march=native loop.c; and time ./a.out
36
2.92user 0.00system 0:02.93elapsed 99%CPU
$ go build -o loop loop.go; and time ./loop
36
2.89user 0.00system 0:02.90elapsed 99%CPU
$ gcc -O3 -march=native loop_nofp.c; and time ./a.out
36
2.92user 0.00system 0:02.94elapsed 99%CPU (0avgtext+0avgdata 1312maxresident)

I have no idea, I'm leaving this for now until a proper answer is posted.

//edit

Changing the C code to use for to match the Go version produced different assembly but the exact same timing.

int main() {
	int32_t i = 0, a = 0;
	for (i = 0; i &lt; 1e9; i++) {
		a = (a + i) % 42;
	}
	printf(&quot;%d\n&quot;, a);
	return 0;
}

答案2

得分: 1

它们在优化时大致相同。例如，

Go：

package main

import "fmt"

func main() {
    a := int32(0)
    for i := int32(0); i < 1e9; i++ {
        a = (a + i) % 42
    }
    fmt.Println(a)
}

运行结果：

36
real    0m15.809s
user    0m15.815s
sys     0m0.061s

C：

#include <stdio.h>

int main() {
  int i = 0, a = 0;

  while (i < 1e9) {
    a = (a + i) % 42;
    i = i + 1;
  }
  printf("%d\n", a);
}

运行结果：

36
real    0m16.538s
user    0m16.528s
sys     0m0.021s

英文:

They are about the same time when optimizing. For example,

Go:

$ cat t.go
package main

import &quot;fmt&quot;

func main() {
	a := int32(0)
	for i := int32(0); i &lt; 1e9; i++ {
		a = (a + i) % 42
	}
	fmt.Println(a)
}
$ go version
go version devel +e1a081e6ddf8 Sat Sep 27 11:56:54 2014 -0700 linux/amd64
$ go build t.go &amp;&amp; time ./t
36
real	0m15.809s
user	0m15.815s
sys	0m0.061s

$ cat t.c
#include &lt;stdio.h&gt;

int main() {
  int i = 0, a = 0;

  while (i &lt; 1e9) {
    a = (a + i) % 42;
    i = i + 1;
  }
  printf(&quot;%d\n&quot;, a);
}
$ gcc --version
gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2
$ gcc -O3 t.c &amp;&amp; time ./a.out
36
real	0m16.538s
user	0m16.528s
sys	0m0.021s

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

为什么这个简单的循环在Go语言中比C语言更快？

问题

答案1

答案2

在Go语言中使用HTTP NTLM请求获取Windows系统凭据

FB交换令牌API出现错误。

Convert function to another type (function casting) in Go

如何在Go语言中使用清晰的结构值来优化性能？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论