AST 中的语义信息

huangapple go评论55阅读模式
英文:

Semantic information in AST

问题

假设我们有一个程序的经典抽象语法树(Abstract Syntax Tree,AST)。我在思考是否值得将纯粹的语义信息(引用的变量类型、字节大小等)放入AST中?我需要所有这些信息来进行不同类型的分析。因此,我认为,拥有这些信息允许我们在多个分析器中都拥有这些信息,每个分析器都作为单独的AST遍历,不依赖于其他分析器。但另一方面,语法和语义信息现在存储在一个地方。

另一种方法是保持AST的最小化,仅表示语法细节。但这会迫使我们在不同的分析器中执行相同的操作(维护符号表等)。

附言:我所说的意义分析涉及变量的使用、类型和控制流分析。

例如,Clang将隐式(实际上不是语法)信息,如类型转换,放入其AST中:

  1. `-FunctionDecl 0x5aeab50 <test.cc:1:1, line:4:1> f 'int (int)'
  2. |-ParmVarDecl 0x5aeaa90 <line:1:7, col:11> x 'int'
  3. `-CompoundStmt 0x5aead88 <col:14, line:4:1>
  4. |-DeclStmt 0x5aead10 <line:2:3, col:24>
  5. | `-VarDecl 0x5aeac10 <col:3, col:23> result 'int'
  6. | `-ParenExpr 0x5aeacf0 <col:16, col:23> 'int'
  7. | `-BinaryOperator 0x5aeacc8 <col:17, col:21> 'int' '/'
  8. | |-ImplicitCastExpr 0x5aeacb0 <col:17> 'int' <LValueToRValue>
  9. | | `-DeclRefExpr 0x5aeac68 <col:17> 'int' lvalue ParmVar 0x5aeaa90 'x' 'int'
  10. | `-IntegerLiteral 0x5aeac90 <col:21> 'int' 42
  11. `-ReturnStmt 0x5aead68 <line:3:3, col:10>
  12. `-ImplicitCastExpr 0x5aead50 <col:10> 'int' <LValueToRValue>
  13. `-DeclRefExpr 0x5aead28 <col:10> 'int' lvalue Var 0x5aeac10 'result' 'int'

我知道,没有人禁止你做任何你想做的事情,但我正在寻找编译器社区中合适且众所周知的方法来进行特定的语义分析。这个问题可以被视为寻求建议,而不是解决方案。

英文:

Suppose we have classic Abstract Syntax Tree of a program. I wonder worth it to put purely semantic information (referred variable type, size in bytes, etc) in AST? I need all this stuff for different types of analysis. So having this, I guess, allows us to have all this information in muptiple analyzers, every of which works as single AST traversal and not depends on others. But on the other hand, syntax and semantic information is now stored in one place.

Another way is to keep AST minimal, representing only syntax details. But then it force us to do same things (maintain symbol tables, etc) in different analyzers.

P.S. Meaning analyzer I'm talking about variable usage, type, control flow analysis.

For example, Clang puts implicit (not really syntax) information such as type conversions into its AST:

  1. `-FunctionDecl 0x5aeab50 <test.cc:1:1, line:4:1> f 'int (int)'
  2. |-ParmVarDecl 0x5aeaa90 <line:1:7, col:11> x 'int'
  3. `-CompoundStmt 0x5aead88 <col:14, line:4:1>
  4. |-DeclStmt 0x5aead10 <line:2:3, col:24>
  5. | `-VarDecl 0x5aeac10 <col:3, col:23> result 'int'
  6. | `-ParenExpr 0x5aeacf0 <col:16, col:23> 'int'
  7. | `-BinaryOperator 0x5aeacc8 <col:17, col:21> 'int' '/'
  8. | |-ImplicitCastExpr 0x5aeacb0 <col:17> 'int' <LValueToRValue>
  9. | | `-DeclRefExpr 0x5aeac68 <col:17> 'int' lvalue ParmVar 0x5aeaa90 'x' 'int'
  10. | `-IntegerLiteral 0x5aeac90 <col:21> 'int' 42
  11. `-ReturnStmt 0x5aead68 <line:3:3, col:10>
  12. `-ImplicitCastExpr 0x5aead50 <col:10> 'int' <LValueToRValue>
  13. `-DeclRefExpr 0x5aead28 <col:10> 'int' lvalue Var 0x5aeac10 'result' 'int'

I know, nobody forbids to do whatever you want, but I looking for proper and well-known in compiler community way to do my specific semantic analysis. This question can be treated as asking for advice, not for solution.

答案1

得分: 1

如果你决定将语义信息存储在树中,你将不得不决定具体要存储什么。如果你有5个有用的东西,你会想要将这5个东西添加到节点的一部分中的结构中。

一旦做到这一点有用,随着时间的推移,你会受到诱惑去添加更多。你附加到AST节点的任何东西都会不断增长,因为你会找到越来越多要添加的东西。最终你会得到一个巨大的“结构”(或类或其他什么)。

我们称这些东西为上帝类。你不希望它们存在。

更好的做法是,对于每种语义分析,保持其信息隔离。

我们(我的公司Semantic Designs)在我们的多语言DMS工具包中使用关联来实现这一点,由与AST节点相关联的哈希码驱动。

然后你可以拥有任意类型的语义信息,只与需要它的节点相关联,只在你需要它时使用。

它还允许你拥有多个线程处理树。它们可以按设计共享关联,或根据需要构建线程特定的实例。

在DMS的25年寿命内,它一直表现良好,涵盖了许多语言和许多许多由语义驱动的工具。

英文:

If you decide to store semantic information in the tree, you'll have to decide what, concretely, to store. If you have 5 useful things, you'll be tempted to add those 5 things in a struct which is part of the node.

Having gotten that to be useful, you'll be tempted over time to add more. Whatever you've attached to the AST node will keep growing because you'll find more and more things to add. What you'll end up with is a giant "struct" (or class or whatever).

We call these things God classes. You don't want them.

Better, for each semantic analysis, to keep its information isolated.

We (my company Semantic Designs) use associations (in our multilingual DMS Toolkit) to do this, driven by a hash code associated with the AST node.

Then you can have as many types of semantic information as you please, associated only with the nodes that need it, only when you need it.

It also allows you to have multiple threads processing the tree. They can, be design either share the associations, or build thread-specific instances as needed.

This has held up nicely over the 25 year lifespan of DMS, over many languages and many, many semantically driven tools.

huangapple
  • 本文由 发表于 2023年6月13日 05:01:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/76460308.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定