为什么在那种特定情况下,gccgo比gc慢?

huangapple go评论81阅读模式
英文:

Why is gccgo slower than gc in that particular case?

问题

我确定每个了解golang的人都知道这篇博客文章

再次阅读它,我想知道是否使用gccgo而不是go build会增加一些速度。在我的典型用例(科学计算)中,gccgo生成的二进制文件总是比go build生成的快。

所以,只需获取这个文件:havlak6.go并编译它:

go build havlak6.go -O havlak6_go
gccgo -o havlak6_gccgo -march=native -Ofast havlak6.go

惊喜!

$/usr/bin/time ./havlak6_go
5.45user 0.06system 0:05.54elapsed 99%CPU

$/usr/bin/time ./havlak6_gccgo
11.38user 0.16system 0:11.74elapsed 98%CPU

我很好奇,想知道为什么一个“优化”编译器会生成更慢的代码。

我尝试在gccgo生成的二进制文件上使用gprof

gccgo -pg -march=native -Ofast havlak6.go
./a.out
gprof a.out gmon.out

但没有成功:

平坦轮廓:

每个样本计为0.01秒。
 没有累积时间

正如你所看到的,代码实际上没有被分析。

当然,我阅读了这个,但是如你所见,程序执行需要10多秒...样本数应该大于1000。

我还尝试了:

rm a.out gmon.out
LDFLAGS='-g -pg' gccgo -g -pg -march=native -Ofast havlak6.go
./a.out
gprof

也没有成功。

你知道出了什么问题吗?你有没有想到为什么在这种情况下,gccgo,带有所有优化例程的编译器,无法比gc更快?

go版本:1.0.2
gcc版本:4.7.2

编辑:

哦,我完全忘记提到...我显然尝试了在gccgo生成的二进制文件上使用pprof...这是一个top10

欢迎使用pprof!要获取帮助,请键入“help”。
(pprof)top10
总共:1143个样本
    1143 100.0% 100.0%     1143 100.0% 0x00007fbfb04cf1f4
       0   0.0% 100.0%      890  77.9% 0x00007fbfaf81101e
       0   0.0% 100.0%        4   0.3% 0x00007fbfaf8deb64
       0   0.0% 100.0%        1   0.1% 0x00007fbfaf8f2faf
       0   0.0% 100.0%        3   0.3% 0x00007fbfaf8f2fc5
       0   0.0% 100.0%        1   0.1% 0x00007fbfaf8f2fc9
       0   0.0% 100.0%        1   0.1% 0x00007fbfaf8f2fd6
       0   0.0% 100.0%        1   0.1% 0x00007fbfaf8f2fdf
       0   0.0% 100.0%        2   0.2% 0x00007fbfaf8f4a2f
       0   0.0% 100.0%        1   0.1% 0x00007fbfaf8f4a33

这就是为什么我在寻找其他东西。

编辑2:

由于似乎有人想要关闭我的问题,我并没有随意使用gprof:https://groups.google.com/d/msg/golang-nuts/1xESoT5Xcd0/bpMvxQeJguMJ

英文:

I'm sure everyone knowing golang knows that blog post here.

Reading it again, I wondered if using gccgo instead of go build would increase the speed a bit more. In my typical use case (scientific computations), a gccgo-generated binary is always faster than a go build-generated one.

So, just grab this file: havlak6.go and compile it:

go build havlak6.go -O havlak6_go
gccgo -o havlak6_gccgo -march=native -Ofast havlak6.go

Surprise !

$/usr/bin/time ./havlak6_go
5.45user 0.06system 0:05.54elapsed 99%CPU

$/usr/bin/time ./havlak6_gccgo
11.38user 0.16system 0:11.74elapsed 98%CPU

I'm curious and want to know why an "optimizing" compiler does produce slower code.

I tried to use gprof on gccgo generated binary:

gccgo -pg -march=native -Ofast havlak6.go
./a.out
gprof a.out gmon.out

with no luck:

Flat profile:

Each sample counts as 0.01 seconds.
 no time accumulated

As you can see the code has not been actually profiled.

Of course, I read this, but as you can see, the program takes 10+ seconds to execute... The number of samples should be > 1000.

I also tried:

rm a.out gmon.out
LDFLAGS='-g -pg' gccgo -g -pg -march=native -Ofast havlak6.go
./a.out
gprof

No success neither.

Do you know what's wrong? Do you have an idea of why gccgo, with all its optimization routines fails to be faster than gc in this case?

go version: 1.0.2
gcc version: 4.7.2

EDIT:

Oh, I completely forgot to mention... I obviously tried pprof on the gccgo-generated binary... Here is a top10:

Welcome to pprof!  For help, type 'help'.
(pprof) top10
Total: 1143 samples
    1143 100.0% 100.0%     1143 100.0% 0x00007fbfb04cf1f4
       0   0.0% 100.0%      890  77.9% 0x00007fbfaf81101e
       0   0.0% 100.0%        4   0.3% 0x00007fbfaf8deb64
       0   0.0% 100.0%        1   0.1% 0x00007fbfaf8f2faf
       0   0.0% 100.0%        3   0.3% 0x00007fbfaf8f2fc5
       0   0.0% 100.0%        1   0.1% 0x00007fbfaf8f2fc9
       0   0.0% 100.0%        1   0.1% 0x00007fbfaf8f2fd6
       0   0.0% 100.0%        1   0.1% 0x00007fbfaf8f2fdf
       0   0.0% 100.0%        2   0.2% 0x00007fbfaf8f4a2f
       0   0.0% 100.0%        1   0.1% 0x00007fbfaf8f4a33

And that's why I'm looking for something else.

EDIT2:

Since it seems that someone wants my question to be closed, I did not try to use gprof out of the blue: https://groups.google.com/d/msg/golang-nuts/1xESoT5Xcd0/bpMvxQeJguMJ

答案1

得分: 2

在Valgrind下运行由gccgo生成的二进制文件似乎表明gccgo具有低效的内存分配器。这可能是为什么gccgo 4.7.2比go 1.0.2慢的原因之一。无法在Valgrind下运行由go 1.0.2生成的二进制文件,因此很难确定内存分配是否是gccgo在这种情况下的主要性能问题。

英文:

Running the gccgo-generated binary under Valgrind seems to indicate that gccgo has an inefficient memory allocator. This may be one of the reasons why gccgo 4.7.2 is slower than go 1.0.2. It is impossible to run a binary generated by go 1.0.2 under Valgrind, so it is hard to confirm for a fact whether memory allocation is gccgo's primary performance problem in this case.

答案2

得分: 0

记住,go build 默认也是静态链接的,所以为了进行公平的比较,你应该给 gccgo 添加 -static-static-libgo 选项。

英文:

Remember go build also defaults to static linking so for an apples to apples comparison you should give gccgo the -static or -static-libgo option.

huangapple
  • 本文由 发表于 2013年2月26日 01:41:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/15073027.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定