在Go中没有符号表吗?

huangapple go评论106阅读模式
英文:

No symbol table in Go?

问题

谷歌的新语言“Go”在其网站上说:

该语言被设计成易于分析,并且可以在没有符号表的情况下进行解析。

我对这些事情当然不是专家,但我认为符号表是所有使用变量的编译器共有的基本构造,而Go显然使用了变量。我没有理解错什么吗?

英文:

Google's new language "Go" says <a href="http://golang.org/doc/go_lang_faq.html#different_syntax">on its website</a>:

> the language has been designed to be easy to analyze and can be parsed without a symbol table

I'm certainly no expert on these matters, but I thought a symbol table was a basic construct common to all compilers for languages that use variables, and Go clearly uses variables. What am I not understanding?

答案1

得分: 25

解析意味着仅仅确定程序的结构:将模块分解为语句/声明,将表达式分解为子表达式等。最终得到一棵树状结构,称为“解析树”或“抽象语法树”(AST)。

显然,C++需要一个符号表来进行解析。

本页面讨论了一些关于为何C++需要符号表进行解析的原因(链接:http://compilers.iecc.com/comparch/article/98-07-199)。

当然,解析仅仅是编译的一部分,你需要一个符号表来进行完整的编译。

然而,解析本身在编写分析工具(例如,哪个模块导入了哪些模块)方面是有用的。因此,简化解析过程意味着更容易编写代码分析工具。

英文:

Parsing means just figuring out the program structure: separating the module into statements/declarations, breaking expressions down to sub-expressions, etc. You end up with a tree structure, known as a "parse tree", or "abstract syntax tree" (AST).

Apparently, C++ requires a symbol table to do parsing.

This page discusses some reasons why C++ requires a symbol table for parsing.

Of course, parsing is only a part of compilation, and you will need a symbol table to do a full compilation.

However, parsing itself can be useful in writing analysis tools (e.g. which module imports which modules). So, simplifying the parsing process means it's easier to write code analysis tools.

答案2

得分: 11

@正义是正确的。稍微扩展一下,在C中,唯一真正棘手的部分是区分类型和变量。特别是当你看到这个时候:

T t;

你需要知道T是一个类型,这样才能进行合法的解析。这是你必须在符号表中查找的内容。只要在解析过程中将类型添加到符号表中,就相对容易弄清楚。编译器不需要做太多额外的工作:要么T在表中存在,要么不存在。

在C++中,情况要复杂得多。有大量的模棱两可或潜在模棱两可的结构。最明显的是这个:

B::C (c);

除了B是一个classtypedef还是namespace不清楚之外,C是一个类型,c是该类型的对象,还是C是一个以c为参数的函数(或构造函数)(甚至C是一个重载了operator()的对象)也不清楚。你需要符号表来继续解析,尽管仍然可以快速进行,因为符号的类型在符号表中。

当模板进入混合时,情况会变得更加糟糕。如果C (c)在一个模板中,你可能在模板的实际定义中不知道C是一个类型还是一个函数/对象。这是因为模板可以声明C既是类型又是变量。这意味着你需要符号表,但你没有一个符号表,而且在模板实际声明之前你也不能有一个符号表。更糟糕的是,仅仅拥有符号的类型可能是不够的:你可能会遇到需要符号所代表的类型的完整信息,包括大小、对齐和其他机器特定的信息的情况。

所有这些都有几个实际影响。我认为最重要的两个是:

  • 编译速度更快。我假设Go的编译速度比C更快,而C++在涉及大量模板的情况下编译时间非常慢。
  • 你可以编写不依赖于完整编译器的解析器。这对于进行代码分析和重构非常有用。
英文:

@Justice is right. To expand on that a little, in C the only actual tricky part is telling types apart from variables. Specifically when you see this:

T t;

You need to know that T is a type for that to be a legal parse. That's something you have to look up in a symbol table. This is relatively simple to figure out as long as types are added to the symbol table as the parse continues. You don't need to do much extra work in the compiler: either T is present in the table or it isn't.

In C++ things are much, much more complicated. There are enormous numbers of ambiguous or potentially ambiguous constructs. The most obvious is this one:

B::C (c);

Aside from the fact that it's not clear if B is a class, a typedef, or a namespace, it's also not clear if C is a type and c an object of that type, or if C is a function (or constructor) taking c as an argument (or even if C is an object with operator() overloaded). You need the symbol table to carry on parsing, although it is still possible to continue quickly enough, as the type of the symbol is in the symbol table.

Things get much, much, much worse than that when templates come into the mix. If C (c) is in a template, you might not know in the actual definition of the template, if C is a type or a function/object. That's because the template can declare C to be either a type or a variable. What this means is that you need the symbol table, but you don't have one -- and you can't have one until the template is actually declared. Even worse, it's not necessarily sufficient to have just the type of the symbol: you can come up with situations which require the full information of the type the symbol represents, including size, alignment, and other machine-specific information.

All this has several practical effects. The two most significant I would say are:

  • Compilation is much faster. I assume Go is faster to compile than C, and C++ has famously slow compilation times for situations involving a lot of templates.
  • You can write parsers that don't depend on having a full compiler. This is very useful for doing code analysis and for refactoring.

答案3

得分: 10

解释和编译绝对需要符号表或类似的东西。这对几乎所有的语言都是适用的。

在C和C++中,甚至解析这种语言都需要一个符号表。

英文:

Interpretation and compilation absolutely require symbol tables or similar. This is true for nearly all languages.

In C and C++, even parsing the language requires a symbol table.

答案4

得分: 4

为了解析大多数语言,您需要知道名称是变量、类型还是函数,以消除某些结构的歧义。Go语言没有这样的模糊结构。

例如:

int x = Foo(bar);

Foo可以是一个类型或一个函数,它们由不同的AST类型表示。基本上,解析器不需要在符号上进行查找,以知道如何构造AST。语法和AST比大多数语言都要简单。真的很酷。

英文:

To parse most languages you need to know when names are variables, types or functions to disambiguate certain constructs. Go has no such ambiguous constructs.

For instance:

int x = Foo(bar);

Foo could be a type or a function and they are represented by different AST types. Basically the parser never has to do lookups on symbols to know how to construct the AST. The grammar and the AST are just simpler than most languages. Pretty cool really.

答案5

得分: 1

符号表是慢的,通常不需要。所以选择不使用它。其他函数式语言也不需要。

快速查找需要哈希,但为了支持嵌套作用域,需要将名称推入/弹出堆栈。简单的符号表实现为线性搜索堆栈,更好的符号表实现为每个符号的哈希和堆栈。但仍然需要在运行时进行搜索。

对于基于词法作用域的语言,解释和编译绝对不需要符号表或类似的东西。
只有动态作用域的符号需要符号表,
一些严格类型的编译器需要一种内部符号表来保存类型注释。

在C和C++中,甚至解析语言都需要符号表,因为需要存储全局变量和函数的类型和声明。

基于词法作用域的符号不存储在符号表中,而是作为块帧中名称的索引列表,就像函数式语言中一样。这些索引在编译时计算。因此,运行时访问是立即的。离开作用域会自动使这些变量无法访问,因此不需要从命名空间/符号表中推入/弹出名称。

没有一级函数的非函数式语言通常需要将其函数名称存储在符号表中。作为语言设计者,您尝试将函数绑定到词法作用域,以便能够摆脱符号表中的动态名称查找。

英文:

Symbol tables are slow and generally not needed. So go choose to go away with it. Other functional languages also need none.
Fast lookup requires a hash, but to support nested scopes you need to push/pop names onto a stack. Simple symtabs are implemented as linear searched stack, better symtabs as hash with a stack per symbol. But still, search has to be done at run-time.

Interpretation and compilation for lexically scoped languages require absolutely no symbol tables or similar.
Only dynamically scoped symbols need symbol tables,
and some compilers with strictly typed languages need some kind of internal symbol table to hold the type annotations.

In C and C++, even parsing the language requires a symbol table, because you need to store the types and declarations of globals and functions.

Lexically scoped symbols are not stored in symtab's but as indexed list of names in block frames, as in functional languages. Those indices are computed at compile-time. So run-time access is immediate. Leaving the scope makes those vars inaccessible automatically, so you don't need to push/pop names from namespaces/symtabs.

Not so functional languages without first-class functions often need to store their function names in symbol tables. As language designer you try to bind functions to lexicals instead, to be able get rid of dynamic name lookup in symtabs.

huangapple
  • 本文由 发表于 2009年11月13日 06:52:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/1725975.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定