英文:
Why is gccgo slower than gc in that particular case?
问题
我确定每个了解golang
的人都知道这篇博客文章。
再次阅读它,我想知道是否使用gccgo
而不是go build
会增加一些速度。在我的典型用例(科学计算)中,gccgo
生成的二进制文件总是比go build
生成的快。
所以,只需获取这个文件:havlak6.go并编译它:
go build havlak6.go -O havlak6_go
gccgo -o havlak6_gccgo -march=native -Ofast havlak6.go
惊喜!
$/usr/bin/time ./havlak6_go
5.45user 0.06system 0:05.54elapsed 99%CPU
$/usr/bin/time ./havlak6_gccgo
11.38user 0.16system 0:11.74elapsed 98%CPU
我很好奇,想知道为什么一个“优化”编译器会生成更慢的代码。
我尝试在gccgo
生成的二进制文件上使用gprof
:
gccgo -pg -march=native -Ofast havlak6.go
./a.out
gprof a.out gmon.out
但没有成功:
平坦轮廓:
每个样本计为0.01秒。
没有累积时间
正如你所看到的,代码实际上没有被分析。
当然,我阅读了这个,但是如你所见,程序执行需要10多秒...样本数应该大于1000。
我还尝试了:
rm a.out gmon.out
LDFLAGS='-g -pg' gccgo -g -pg -march=native -Ofast havlak6.go
./a.out
gprof
也没有成功。
你知道出了什么问题吗?你有没有想到为什么在这种情况下,gccgo
,带有所有优化例程的编译器,无法比gc
更快?
go
版本:1.0.2
gcc
版本:4.7.2
编辑:
哦,我完全忘记提到...我显然尝试了在gccgo
生成的二进制文件上使用pprof...这是一个top10
:
欢迎使用pprof!要获取帮助,请键入“help”。
(pprof)top10
总共:1143个样本
1143 100.0% 100.0% 1143 100.0% 0x00007fbfb04cf1f4
0 0.0% 100.0% 890 77.9% 0x00007fbfaf81101e
0 0.0% 100.0% 4 0.3% 0x00007fbfaf8deb64
0 0.0% 100.0% 1 0.1% 0x00007fbfaf8f2faf
0 0.0% 100.0% 3 0.3% 0x00007fbfaf8f2fc5
0 0.0% 100.0% 1 0.1% 0x00007fbfaf8f2fc9
0 0.0% 100.0% 1 0.1% 0x00007fbfaf8f2fd6
0 0.0% 100.0% 1 0.1% 0x00007fbfaf8f2fdf
0 0.0% 100.0% 2 0.2% 0x00007fbfaf8f4a2f
0 0.0% 100.0% 1 0.1% 0x00007fbfaf8f4a33
这就是为什么我在寻找其他东西。
编辑2:
由于似乎有人想要关闭我的问题,我并没有随意使用gprof
:https://groups.google.com/d/msg/golang-nuts/1xESoT5Xcd0/bpMvxQeJguMJ
英文:
I'm sure everyone knowing golang
knows that blog post here.
Reading it again, I wondered if using gccgo
instead of go build
would increase the speed a bit more. In my typical use case (scientific computations), a gccgo
-generated binary is always faster than a go build
-generated one.
So, just grab this file: havlak6.go and compile it:
go build havlak6.go -O havlak6_go
gccgo -o havlak6_gccgo -march=native -Ofast havlak6.go
Surprise !
$/usr/bin/time ./havlak6_go
5.45user 0.06system 0:05.54elapsed 99%CPU
$/usr/bin/time ./havlak6_gccgo
11.38user 0.16system 0:11.74elapsed 98%CPU
I'm curious and want to know why an "optimizing" compiler does produce slower code.
I tried to use gprof
on gccgo
generated binary:
gccgo -pg -march=native -Ofast havlak6.go
./a.out
gprof a.out gmon.out
with no luck:
Flat profile:
Each sample counts as 0.01 seconds.
no time accumulated
As you can see the code has not been actually profiled.
Of course, I read this, but as you can see, the program takes 10+ seconds to execute... The number of samples should be > 1000.
I also tried:
rm a.out gmon.out
LDFLAGS='-g -pg' gccgo -g -pg -march=native -Ofast havlak6.go
./a.out
gprof
No success neither.
Do you know what's wrong? Do you have an idea of why gccgo
, with all its optimization routines fails to be faster than gc
in this case?
go
version: 1.0.2
gcc
version: 4.7.2
EDIT:
Oh, I completely forgot to mention... I obviously tried pprof on the gccgo
-generated binary... Here is a top10
:
Welcome to pprof! For help, type 'help'.
(pprof) top10
Total: 1143 samples
1143 100.0% 100.0% 1143 100.0% 0x00007fbfb04cf1f4
0 0.0% 100.0% 890 77.9% 0x00007fbfaf81101e
0 0.0% 100.0% 4 0.3% 0x00007fbfaf8deb64
0 0.0% 100.0% 1 0.1% 0x00007fbfaf8f2faf
0 0.0% 100.0% 3 0.3% 0x00007fbfaf8f2fc5
0 0.0% 100.0% 1 0.1% 0x00007fbfaf8f2fc9
0 0.0% 100.0% 1 0.1% 0x00007fbfaf8f2fd6
0 0.0% 100.0% 1 0.1% 0x00007fbfaf8f2fdf
0 0.0% 100.0% 2 0.2% 0x00007fbfaf8f4a2f
0 0.0% 100.0% 1 0.1% 0x00007fbfaf8f4a33
And that's why I'm looking for something else.
EDIT2:
Since it seems that someone wants my question to be closed, I did not try to use gprof
out of the blue: https://groups.google.com/d/msg/golang-nuts/1xESoT5Xcd0/bpMvxQeJguMJ
答案1
得分: 2
在Valgrind下运行由gccgo生成的二进制文件似乎表明gccgo具有低效的内存分配器。这可能是为什么gccgo 4.7.2比go 1.0.2慢的原因之一。无法在Valgrind下运行由go 1.0.2生成的二进制文件,因此很难确定内存分配是否是gccgo在这种情况下的主要性能问题。
英文:
Running the gccgo-generated binary under Valgrind seems to indicate that gccgo
has an inefficient memory allocator. This may be one of the reasons why gccgo
4.7.2 is slower than go
1.0.2. It is impossible to run a binary generated by go 1.0.2 under Valgrind, so it is hard to confirm for a fact whether memory allocation is gccgo's primary performance problem in this case.
答案2
得分: 0
记住,go build
默认也是静态链接的,所以为了进行公平的比较,你应该给 gccgo 添加 -static
或 -static-libgo
选项。
英文:
Remember go build
also defaults to static linking so for an apples to apples comparison you should give gccgo the -static
or -static-libgo
option.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论