英文:
Why are binaries built with gccgo smaller (among other differences?)
问题
我一直在尝试使用gc和gccgo,并且遇到了一些奇怪的行为。
使用我曾经编写的一个程序来测试某个定理,我得到了以下结果:(为了可读性,我删除了不必要的信息)
$ time go build -compiler gc -o checkprog_gc checkprog.go (x 3)
go build <...> 0.13s user 0.02s system 100% cpu 0.149 total
go build <...> 0.13s user 0.01s system 99% cpu 0.148 total
go build <...> 0.14s user 0.03s system 100% cpu 0.162 total
--> 平均时间:0.13s user 0.02s system 100% cpu 0.153 total
$ time go build -compiler gccgo -o checkprog_gccgo checkprog.go (x 3)
go build <...> 0.10s user 0.03s system 96% cpu 0.135 total
go build <...> 0.12s user 0.01s system 96% cpu 0.131 total
go build <...> 0.10s user 0.01s system 92% cpu 0.123 total
--> 平均时间:0.11s user 0.02s system 95% cpu 0.130 total
$ strip -s -o checkprog_gc_stripped checkprog_gc
$ strip -s -o checkprog_gccgo_stripped checkprog_gccgo
$ ls -l
1834504 checkprog_gc*
1336992 checkprog_gc_stripped*
35072 checkprog_gccgo*
24192 checkprog_gccgo_stripped*
$ time ./checkprog_gc
./checkprog_gc 6.68s user 0.01s system 100% cpu 6.674 total
./checkprog_gc 6.75s user 0.01s system 100% cpu 6.741 total
./checkprog_gc 6.66s user 0.00s system 100% cpu 6.643 total
--> 平均时间:6.70s user 0.01s system 100% cpu 6.686 total
$ time ./checkprog_gccgo
./checkprog_gccgo 10.95s user 0.02s system 100% cpu 10.949 total
./checkprog_gccgo 10.98s user 0.01s system 100% cpu 10.964 total
./checkprog_gccgo 10.94s user 0.01s system 100% cpu 10.929 total
--> 平均时间:10.96s user 0.01s system 100% cpu 10.947 total
我可以看到以下模式:
- 使用
gccgo
构建的二进制文件在大小上明显较小(剥离符号表也无法改变这种差异)。 - 使用
gc
构建的二进制文件执行速度更快。 - 使用
gccgo
编译需要更长的时间。
我还测试了其他一些Go程序(虽然不是很广泛),它们都表现出相同的行为。
这似乎与这个答案中所述的相矛盾:
简而言之:gccgo:更多优化,更多处理器。
我认为更多的优化意味着更快的二进制文件,但需要更长的编译时间...
这三个模式的原因是什么?
英文:
I've been experimenting with gc and gccgo, and I've encountered some odd behaviour.
Using a program I once wrote to test some theorem, I got these results: (I removed unnecessary information for readablitity)
$ time go build -compiler gc -o checkprog_gc checkprog.go (x 3)
go build <...> 0.13s user 0.02s system 100% cpu 0.149 total
go build <...> 0.13s user 0.01s system 99% cpu 0.148 total
go build <...> 0.14s user 0.03s system 100% cpu 0.162 total
--> average: 0.13s user 0.02s system 100% cpu 0.153 total
$ time go build -compiler gccgo -o checkprog_gccgo checkprog.go (x 3)
go build <...> 0.10s user 0.03s system 96% cpu 0.135 total
go build <...> 0.12s user 0.01s system 96% cpu 0.131 total
go build <...> 0.10s user 0.01s system 92% cpu 0.123 total
--> average: 0.11s user 0.02s system 95% cpu 0.130 total
$ strip -s -o checkprog_gc_stripped checkprog_gc
$ strip -s -o checkprog_gccgo_stripped checkprog_gccgo
$ ls -l
1834504 checkprog_gc*
1336992 checkprog_gc_stripped*
35072 checkprog_gccgo*
24192 checkprog_gccgo_stripped*
$ time ./checkprog_gc
./checkprog_gc 6.68s user 0.01s system 100% cpu 6.674 total
./checkprog_gc 6.75s user 0.01s system 100% cpu 6.741 total
./checkprog_gc 6.66s user 0.00s system 100% cpu 6.643 total
--> average: 6.70s user 0.01s system 100% cpu 6.686 total
$ time ./checkprog_gccgo
./checkprog_gccgo 10.95s user 0.02s system 100% cpu 10.949 total
./checkprog_gccgo 10.98s user 0.01s system 100% cpu 10.964 total
./checkprog_gccgo 10.94s user 0.01s system 100% cpu 10.929 total
--> average 10.96s user 0.01s system 100% cpu 10.947 total
I can see the following patterns:
- Binaries built with
gccgo
are radically smaller in size (and stripping doesn't help to change this difference) - Binaries built with
gc
are faster to execute - It takes a bit more time to build with
gccgo
than withgc
I also tested some other go programs (while not that extensively) and all of them exhibit the same behavior.
This seems to contradict what this answer states:
> In short: gccgo: more optimization, more processors.
I'd think that more optimization means faster binaries, while needing more time to compile...
What's the reason these three patterns?
答案1
得分: 8
大小不同是因为gc生成静态二进制文件,而gccgo链接到libgo。这意味着整个运行时的代码(调度器、垃圾回收器、映射、通道)不包含在gccgo创建的最终二进制文件中。
编译速度当然会偏向gc。GC的设计考虑了编译速度。它通常生成的代码优化较少,需要执行的工作也较少。
现在来看为什么gc仍然更快。事实是它们两者并不总是比对方更快。例如,尝试对文件进行md5哈希,GCCGO会快上一个数量级。尝试实现大量通道的东西,gc肯定会胜出。你不能总是提前知道哪个会成功。GC往往具有更高效的并发性,而gccgo在数学方面更擅长。然而,这是需要根据具体情况进行测试的。最好使用go test的基准测试系统,而不是使用时间来测试。
英文:
The size is different because gc makes static binaries and gccgo links to libgo. This means that the code for the entire runtime (scheduler, garbage collector, maps, channels) is not in the final binary created by gccgo.
The speed of compile will of course favor gc. GC was built with speed of compilation in mind. It also generally makes less optimized code and has less work it needs to perform.
Now on to why gc is still faster. The truth is that neither of them are always faster than the other. For example, try to md5 a file and GCCGO will be an order of magnitude faster. Try to implement something with a lot of channels and gc will surely win. You can't always tell ahead of time which will succeed. GC tends to have more efficient concurrency and gccgo tends to be better at math. However, this is something you need to test on a case by case basis. Preferably using go test's benchmarking system and not time.
答案2
得分: 8
有很多区别 - bradfitz在2014年5月的演讲中谈到了其中一些区别:
gccgo
可以生成一个动态链接到libgo
的二进制文件,这使得输出文件更小,但意味着目标机器上必须安装相关的库。没有cgo
的Go二进制文件没有这个要求。gccgo
进行了更多的低级优化,因为它可以使用gcc
的代码生成器和优化器。在编写一些数据压缩代码时,gccgo的运行速度明显快于gc
。这些相同的优化使得编译器变慢:它要做更多的工作。gccgo
支持gcc
支持的目标处理器,因此它是在一些架构上(如SPARC、ARMv8(64位)或POWER)的唯一选择。(Canonical使用它来编译他们的Juju服务编排工具,用于arm64和ppc64。)gccgo
和gc
都支持ARMv7(32位),但根据bradfitz的演讲,gc
不会生成最高效的ARM代码。gc
有一些特定的优化。- 其中一个是逃逸分析,编译器确定一些变量永远不会“逃逸”到它们被分配的函数之外,因此可以在栈上分配。因此,令人惊讶的是,如果
new(T)
的返回值不逃逸,它可能不会在堆上分配。这减少了垃圾回收的频率。 - 另一个是标准库中的
.s
汇编文件只有gc
链接,因此一些像Intel硬件CRC32C这样的东西默认情况下不会被gccgo
使用(你需要为gccgo提供一个专门的实现)。
- 其中一个是逃逸分析,编译器确定一些变量永远不会“逃逸”到它们被分配的函数之外,因此可以在栈上分配。因此,令人惊讶的是,如果
gc
首先实现新的语言特性,并且通常领先于最新的gccgo
版本一两个Go版本。
英文:
There are a bunch of differences--bradfitz talked about some of them in a May 2014 talk:
gccgo
can produce a binary that dynamically links inlibgo
, which makes the output smaller but means the relevant library to be installed on the target machine. Go binaries withoutcgo
don't have that requirement.gccgo
does more low-level optimizations 'cause it can usegcc
's code generator and optimizer. Writing some data-compression code, gccgo ran it noticeably faster thangc
. Those same optimizations make the compiler slower: it's doing more work.gccgo
supports the target processors thatgcc
does, so it's the only way to get on some architectures like SPARC, ARMv8 (64-bit) or POWER. (Canonical uses it to compile their Juju service orchestration tool for arm64 and ppc64.)gccgo
andgc
both support ARMv7 (32-bit), but according to bradfitz's talkgc
does not generate the most efficient ARM code.- There are certain optimizations only
gc
has. - A big one is escape analysis, in which the compiler determines that some variables will never "escape" the function where they're allocated and therefore can be stack-allocated. (So, surprisingly,
new(T)
may not heap-allocate if its return value doesn't escape.) This reduces how often garbage collection needs to run. - Another is that
.s
assembler files in the standard library are only linked in bygc
, so some stuff like Intel hardware CRC32C isn't used bygccgo
by default (you'd have to provide an implementation specifically for gccgo). gc
implements new language features first and has generally been a minor Go version or two ahead of the latestgccgo
.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论