英文:
How does Go compile so quickly?
问题
我已经谷歌过并在Go的网站上搜索过,但是我找不到关于Go的非凡构建时间的解释。它们是语言特性(或缺乏特性)的产物,还是高度优化的编译器,或者其他原因?我并不是在推广Go,我只是好奇。
英文:
I've Googled and poked around the Go website, but I can't find an explanation for Go's extraordinary build times. Are they products of the language features (or lack thereof), a highly optimized compiler, or something else? I'm not trying to promote Go; I'm just curious.
答案1
得分: 213
依赖分析。
Go FAQ曾经包含以下句子:
> Go提供了一种软件构建模型,使得依赖分析变得容易,并避免了C风格的包含文件和库的大部分开销。
虽然这个短语不再出现在FAQ中,但这个主题在Go at Google的演讲中有详细阐述,该演讲比较了C/C++和Go的依赖分析方法。
这是快速编译的主要原因,也是有意设计的。
英文:
Dependency analysis.
The Go FAQ used to contain the following sentence:
> Go provides a model for software
> construction that makes dependency
> analysis easy and avoids much of the
> overhead of C-style include files and
> libraries.
While the phrase is not in the FAQ anymore, this topic is elaborated upon in the talk Go at Google, which compares the dependency analysis approach of C/C++ and Go.
That is the main reason for fast compilation. And this is by design.
答案2
得分: 81
我认为并不是Go编译器快,而是其他编译器慢。
C和C++编译器需要解析大量的头文件 - 例如,编译C++的“hello world”需要编译18k行代码,几乎是半兆字节的源代码!
$ cpp hello.cpp | wc
18364 40513 433334
Java和C#编译器在虚拟机中运行,这意味着在它们能够编译任何东西之前,操作系统必须加载整个虚拟机,然后它们必须从字节码即时编译为本机代码,所有这些都需要一些时间。
编译速度取决于几个因素。
有些语言被设计成编译速度快。例如,Pascal被设计成使用单通编译器进行编译。
编译器本身也可以进行优化。例如,Turbo Pascal编译器是用手动优化的汇编语言编写的,这与语言设计相结合,使得它在286级硬件上运行时成为一个真正快速的编译器。我认为即使现在,现代的Pascal编译器(例如FreePascal)也比Go编译器更快。
英文:
I think it's not that Go compilers are fast, it's that other compilers are slow.
C and C++ compilers have to parse enormous amounts of headers - for example, compiling C++ "hello world" requires compiling 18k lines of code, which is almost half a megabyte of sources!
$ cpp hello.cpp | wc
18364 40513 433334
Java and C# compilers run in a VM, which means that before they can compile anything, the operating system has to load the whole VM, then they have to be JIT-compiled from bytecode to native code, all of which takes some time.
Speed of compilation depends on several factors.
Some languages are designed to be compiled fast. For example, Pascal was designed to be compiled using a single-pass compiler.
Compilers itself can be optimized too. For example, the Turbo Pascal compiler was written in hand-optimized assembler, which, combined with the language design, resulted in a really fast compiler working on 286-class hardware. I think that even now, modern Pascal compilers (e.g. FreePascal) are faster than Go compilers.
答案3
得分: 45
有多个原因导致Go编译器比大多数C/C++编译器快得多:
-
最重要的原因:大多数C/C++编译器在编译速度方面设计得非常糟糕。此外,从编译速度的角度来看,C/C++生态系统的某些部分(例如程序员编写代码的编辑器)并没有考虑编译速度。
-
最重要的原因:快速编译速度是Go编译器和Go语言的有意选择。
-
Go编译器比C/C++编译器拥有更简单的优化器。
-
与C++不同,Go没有模板和内联函数。这意味着Go不需要执行任何模板或函数实例化。
-
Go编译器更早地生成低级汇编代码,并且优化器在汇编代码上工作,而在典型的C/C++编译器中,优化传递在原始源代码的内部表示上工作。C/C++编译器中的额外开销来自于需要生成内部表示的事实。
-
Go程序的最终链接(5l/6l/8l)可能比链接C/C++程序更慢,因为Go编译器需要处理所有使用的汇编代码,而且可能还在执行其他额外的操作,而C/C++链接器不会执行这些操作。
-
一些C/C++编译器(如GCC)以文本形式生成指令(用于传递给汇编器),而Go编译器以二进制形式生成指令。需要进行一些额外的工作(但不多)以将文本转换为二进制。
-
Go编译器仅针对少数几种CPU架构,而GCC编译器针对大量CPU架构。
-
那些旨在实现高编译速度目标的编译器(如Jikes)是快速的。在2GHz的CPU上,Jikes每秒可以编译20000多行Java代码(增量编译模式甚至更高效)。
英文:
There are multiple reasons why the Go compiler is much faster than most C/C++ compilers:
-
Top reason: Most C/C++ compilers exhibit exceptionally bad designs (from compilation speed perspective). Also, from compilation speed perspective, some parts of the C/C++ ecosystem (such as editors in which programmers are writing their code) aren't designed with speed-of-compilation in mind.
-
Top reason: Fast compilation speed was a conscious choice in the Go compiler and also in the Go language
-
The Go compiler has a simpler optimizer than C/C++ compilers
-
Unlike C++, Go has no templates and no inline functions. This means that Go doesn't need to perform any template or function instantiation.
-
The Go compiler generates low-level assembly code sooner and the optimizer works on the assembly code, while in a typical C/C++ compiler the optimization passes work on an internal representation of the original source code. The extra overhead in the C/C++ compiler comes from the fact that the internal representation needs to be generated.
-
Final linking (5l/6l/8l) of a Go program can be slower than linking a C/C++ program, because the Go compiler is going through all of the used assembly code and maybe it is also doing other extra actions that C/C++ linkers aren't doing
-
Some C/C++ compilers (GCC) generate instructions in text form (to be passed to the assembler), while the Go compiler generates instructions in binary form. Extra work (but not much) needs to be done in order to transform the text into binary.
-
The Go compiler targets only a small number of CPU architectures, while the GCC compiler targets a large number of CPUs
-
Compilers which were designed with the goal of high compilation speed, such as Jikes, are fast. On a 2GHz CPU, Jikes can compile 20000+ lines of Java code per second (and the incremental mode of compilation is even more efficient).
答案4
得分: 36
编译效率是一个主要的设计目标:
> 最后,它旨在快速:在单台计算机上构建一个大型可执行文件最多只需几秒钟的时间。为了实现这些目标,需要解决一些语言问题:一个富有表现力但轻量级的类型系统;并发和垃圾回收;严格的依赖规范等等。FAQ
关于解析相关的特定语言特性,语言FAQ非常有趣:
> 其次,该语言被设计成易于分析,可以在没有符号表的情况下进行解析。
英文:
Compilation efficiency was a major design goal:
> Finally, it is intended to be fast: it should take at most a few seconds to build a large executable on a single computer. To meet these goals required addressing a number of linguistic issues: an expressive but lightweight type system; concurrency and garbage collection; rigid dependency specification; and so on. FAQ
The language FAQ is pretty interesting in regards to specific language features relating to parsing:
> Second, the language has been designed to be easy to analyze and can be parsed without a symbol table.
答案5
得分: 34
尽管上述大部分内容是正确的,但有一个非常重要的点没有被提到:依赖管理。
Go语言只需要包含你直接导入的包(因为它们已经导入了它们所需的内容)。这与C/C++截然不同,C/C++中的每个文件都会开始包含x头文件,而x头文件又包含y头文件等等。总之,Go语言的编译时间与导入的包数量成线性关系,而C/C++则需要指数级的时间。
英文:
While most of the above is true, there is one very important point that was not really mentionend: Dependency management.
Go only needs to include the packages that you are importing directly (as those already imported what they need). This is in stark contrast to C/C++, where every single file starts including x headers, which include y headers etc. Bottom line: Go's compiling takes linear time w.r.t to the number of imported packages, where C/C++ take exponential time.
答案6
得分: 26
编译器翻译效率的一个很好的测试是自编译:给定的编译器编译自身需要多长时间?对于C++来说,这需要很长时间(几个小时?)。相比之下,Pascal/Modula-2/Oberon编译器在现代计算机上编译自身只需要不到一秒钟1。
Go语言受到这些语言的启发,但其中一些提高效率的主要原因包括:
-
一个明确定义的语法,数学上是合理的,以实现高效的扫描和解析。
-
一种类型安全且静态编译的语言,使用分离的编译,带有依赖和类型检查,跨模块边界,以避免不必要地重新读取头文件和重新编译其他模块 - 与C/C++中的独立编译相反,编译器不执行此类跨模块检查(因此即使对于一个简单的一行“hello world”程序,也需要一遍又一遍地重新读取所有这些头文件)。
-
一种高效的编译器实现(例如单遍、递归下降自顶向下解析)- 当然,这在上述的第1和第2点的帮助下得到了很大的帮助。
这些原则在20世纪70年代和80年代的语言(如Mesa、Ada、Modula-2/Oberon等)中已经被广泛知晓和完全实现,并且现在(在2010年代)才开始在现代语言(如Go(Google)、Swift(Apple)、C#(Microsoft)等)中得到应用。
让我们希望这很快会成为常态而不是例外。要实现这一点,需要发生两件事情:
-
首先,诸如Google、Microsoft和Apple等软件平台提供商应该首先鼓励应用程序开发人员使用新的编译方法,同时使他们能够重用现有的代码库。这正是苹果现在试图通过Swift编程语言所做的,它可以与Objective-C共存(因为它使用相同的运行时环境)。
-
其次,底层软件平台本身应该随着时间的推移使用这些原则进行重写,同时在此过程中重新设计模块层次结构,使其不再是单块式的。当然,这是一项巨大的任务,可能需要大部分时间(如果他们有足够的勇气真正去做的话 - 对于Google来说,我并不确定)。
无论如何,是平台推动了语言的采用,而不是反过来。
参考资料:
1 http://www.inf.ethz.ch/personal/wirth/ProjectOberon/PO.System.pdf,第6页:“编译器自身编译大约需要3秒”。这个引用是针对一个低成本的Xilinx Spartan-3 FPGA开发板,在25 MHz的时钟频率下运行,并具有1 MByte的主存储器。从这个引用中可以轻松推断出,对于一个运行频率远远超过1 GHz并具有几个GBytes的主存储器(即比Xilinx Spartan-3 FPGA板强大几个数量级)的现代处理器来说,“少于1秒”是可以实现的,即使考虑到I/O速度。早在1990年,当Oberon在一个25MHz的NS32X32处理器上运行,并具有2-4 MBytes的主存储器时,编译器只需要几秒钟就能编译自身。即使在那个时候,等待编译器完成编译周期的概念对于Oberon程序员来说完全是未知的。对于典型的程序来说,从触发编译命令的鼠标按钮上移开手指所需的时间总是比等待编译器完成编译所需的时间更长。这是真正的即时满足,几乎没有等待时间。尽管与当时最好的编译器相比,所产生的代码质量并不总是完全达到最佳水平,但对于大多数任务来说,它的质量非常好,总体上是可以接受的。
英文:
A good test for the translation efficiency of a compiler is self-compilation: how long does it take a given compiler to compile itself? For C++ it takes a very long time (hours?). By comparison, a Pascal/Modula-2/Oberon compiler would compile itself in less than one second on a modern machine 1.
Go has been inspired by these languages, but some of the main reasons for this efficiency include:
-
A clearly defined syntax that is mathematically sound, for efficient scanning and parsing.
-
A type-safe and statically-compiled language that uses separate compilation with dependency and type checking across module boundaries, to avoid unnecessary re-reading of header files and re-compiling of other modules - as opposed to independent compilation like in C/C++ where no such cross-module checks are performed by the compiler (hence the need to re-read all those header files over and over again, even for a simple one-line "hello world" program).
-
An efficient compiler implementation (e.g. single-pass, recursive-descent top-down parsing) - which of course is greatly helped by points 1 and 2 above.
These principles have already been known and fully implemented in the 1970s and 1980s in languages like Mesa, Ada, Modula-2/Oberon and several others, and are only now (in the 2010s) finding their way into modern languages like Go (Google), Swift (Apple), C# (Microsoft) and several others.
Let's hope that this will soon be the norm and not the exception. To get there, two things need to happen:
-
First, software platform providers such as Google, Microsoft and Apple should start by encouraging application developers to use the new compilation methodology, while enabling them to re-use their existing code base. This is what Apple is now trying to do with the Swift programming language, which can co-exist with Objective-C (since it uses the same runtime environment).
-
Second, the underlying software platforms themselves should eventually be re-written over time using these principles, while simultaneously redesigning the module hierarchy in the process to make them less monolithic. This is of course a mammoth task and may well take the better part of a decade (if they are courageous enough to actually do it - which I am not at all sure in the case of Google).
In any case, it's the platform that drives language adoption, and not the other way around.
References:
1 http://www.inf.ethz.ch/personal/wirth/ProjectOberon/PO.System.pdf, page 6: "The compiler compiles itself in about 3 seconds". This quote is for a low cost Xilinx Spartan-3 FPGA development board running at a clock frequency of 25 MHz and featuring 1 MByte of main memory. From this one can easily extrapolate to "less than 1 second" for a modern processor running at a clock frequency well above 1 GHz and several GBytes of main memory (i.e. several orders of magnitude more powerful than the Xilinx Spartan-3 FPGA board), even when taking I/O speeds into account. Already back in 1990 when Oberon was run on a 25MHz NS32X32 processor with 2-4 MBytes of main memory, the compiler compiled itself in just a few seconds. The notion of actually waiting for the compiler to finish a compilation cycle was completely unknown to Oberon programmers even back then. For typical programs, it always took more time to remove the finger from the mouse button that triggered the compile command than to wait for the compiler to complete the compilation just triggered. It was truly instant gratification, with near-zero wait times. And the quality of the produced code, even though not always completely on par with the best compilers available back then, was remarkably good for most tasks and quite acceptable in general.
答案7
得分: 17
Go被设计成快速的,并且它表现出来了。
- 依赖管理:没有头文件,你只需要查看直接导入的包(不需要担心它们导入了什么),因此你有线性依赖关系。
- 语法:语言的语法简单,因此容易解析。虽然功能数量减少,但编译器代码本身很紧凑(路径较少)。
- 不允许重载:你看到一个符号,就知道它指的是哪个方法。
- 可以轻松地并行编译Go,因为每个包可以独立编译。
请注意,Go并不是唯一具有这些特性的语言(模块在现代语言中是常见的),但它们做得很好。
英文:
Go was designed to be fast, and it shows.
- Dependency Management: no header file, you just need to look at the packages that are directly imported (no need to worry about what they import) thus you have linear dependencies.
- Grammar: the grammar of the language is simple, thus easily parsed. Although the number of features is reduced, thus the compiler code itself is tight (few paths).
- No overload allowed: you see a symbol, you know which method it refers to.
- It's trivially possible to compile Go in parallel because each package can be compiled independently.
Note that Go isn't the only language with such features (modules are the norm in modern languages), but they did it well.
答案8
得分: 15
引用自Alan Donovan和Brian Kernighan的书《Go编程语言》:
Go编译速度明显快于大多数其他编译语言,即使是从头开始构建。编译器的速度有三个主要原因。首先,所有导入必须明确列在每个源文件的开头,因此编译器不需要读取和处理整个文件来确定其依赖关系。其次,一个包的依赖关系形成一个有向无环图,因为没有循环,包可以分别编译,甚至可以并行编译。最后,编译Go包时,生成的目标文件不仅记录了该包本身的导出信息,还记录了其依赖关系的导出信息。在编译一个包时,编译器必须读取每个导入的目标文件,但不需要查看这些文件之外的内容。
英文:
Quoting from the book "The Go Programming Language" by Alan Donovan and Brian Kernighan:
> Go compilation is notably faster than most other compiled languages, even when building from scratch. There are three main reasons for the compiler’s speed. First, all imports must be explicitly listed at the beginning of each source file, so the compiler does not have to read and process an entire file to determine its dependencies. Second, the dependencies of a package form a directed acyclic graph, and because there are no cycles, packages can be compiled separately and perhaps in parallel. Finally, the object file for a compiled Go package records export information not just for the package itself, but for its dependencies too. When compiling a package, the compiler must read one object file for each import but need not look beyond these files.
答案9
得分: 10
编译的基本思想实际上非常简单。递归下降解析器原则上可以以I/O限制的速度运行。代码生成基本上是一个非常简单的过程。符号表和基本类型系统并不需要大量计算。
然而,让编译器变慢并不难。
如果有预处理阶段,带有多级include指令、宏定义和条件编译,尽管这些东西很有用,但很容易使其变慢。(举个例子,我在想Windows和MFC头文件。)这就是为什么需要预编译头文件。
在优化生成的代码方面,对于可以添加多少处理的限制是没有的。
英文:
The basic idea of compilation is actually very simple. A recursive-descent parser, in principle, can run at I/O bound speed. Code generation is basically a very simple process. A symbol table and basic type system is not something that requires a lot of computation.
However, it is not hard to slow down a compiler.
If there is a preprocessor phase, with multi-level include directives, macro definitions, and conditional compilation, as useful as those things are, it is not hard to load it down. (For one example, I'm thinking of the Windows and MFC header files.) That is why precompiled headers are necessary.
In terms of optimizing the generated code, there is no limit to how much processing can be added to that phase.
答案10
得分: 8
简单来说,因为Go语言的语法非常简单,易于分析和解析。
例如,没有类型继承意味着不需要复杂的分析来确定新类型是否符合基类型所规定的规则。
例如,在这个代码示例中:[“interfaces”]编译器在分析类型时不会去检查所需类型是否实现了给定的接口,只有在使用时(如果使用了)才会进行检查。
另一个例子是,编译器会告诉你是否声明了一个变量但没有使用它(或者如果你应该保存一个返回值但没有保存)。
以下代码无法编译:
package main
func main() {
var a int
a = 0
}
notused.go:3: a declared and not used
这种强制执行和原则使得生成的代码更安全,编译器不需要执行程序员可以做的额外验证。
总的来说,所有这些细节使得语言更容易解析,从而实现快速编译。
再次用我自己的话来说。
英文:
Simply ( in my own words ), because the syntax is very easy ( to analyze and to parse )
For instance, no type inheritance means, not problematic analysis to find out if the new type follows the rules imposed by the base type.
For instance in this code example: "interfaces" the compiler doesn't go and check if the intended type implement the given interface while analyzing that type. Only until it's used ( and IF it is used ) the check is performed.
Other example, the compiler tells you if you're declaring a variable and not using it ( or if you are supposed to hold a return value and you're not )
The following doesn't compile:
package main
func main() {
var a int
a = 0
}
notused.go:3: a declared and not used
This kinds of enforcements and principles make the resulting code safer, and the compiler doesn't have to perform extra validations that the programmer can do.
At large all these details make a language easier to parse which result in fast compilations.
Again, in my own words.
答案11
得分: 2
Go一次性为所有文件导入依赖项,因此随着项目规模的增长,导入时间不会呈指数增长。
更简单的语言学意味着解释它们所需的计算量更少。
还有什么需要翻译的吗?
英文:
- Go imports dependencies once for all files, so the import time doesn't increase exponentially with project size.
- Simpler linguistics means interpreting them takes less computing.
What else?
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论