2022年7月15日 06:36:19go评论88阅读模式

英文:

Should tokens be part of AST Nodes

问题

我的回答：

在编译器的前端阶段，通常会按照以下方式进行处理：字符流 -> 词法单元流 -> 抽象语法树（AST）。在每个层级上，都会对某些内容进行抽象。在我看来，一个词法单元（Token）不应该是抽象语法树（AST）节点的一部分。

词法单元是否应该是抽象语法树节点的一部分？
你能给出一些编程语言选择哪种方式的例子吗？

在大多数编程语言中，词法单元（Token）通常不是抽象语法树（AST）节点的一部分。词法单元（Token）是在词法分析阶段生成的，用于表示代码中的基本语法单元，如关键字、标识符、运算符等。而抽象语法树（AST）则是在语法分析阶段生成的，用于表示代码的结构和语义。

然而，有些编程语言可能会将词法单元（Token）作为抽象语法树（AST）节点的一部分，这通常是为了方便语法分析和语义分析的实现。这样做可以将词法单元（Token）的信息直接与抽象语法树（AST）节点关联起来，方便错误处理和语义分析的实现。

总的来说，是否将词法单元（Token）作为抽象语法树（AST）节点的一部分取决于编程语言的设计和实现选择。大多数编程语言更倾向于将它们分开，以便更好地组织和处理代码的结构和语义。

英文:

Some background why I ask this question.

I'm reading Writing An Interpreter In Go, in the book, Token struct is inside of AST Nodes. Node is a type that can be fulfilled by implementing tokenLiteral() and String()

type IntegerLiteral struct {
	Token token.Token
	Value int64
}

type Node interface {
	TokenLiteral() string
	String() string
}

I understood that in real life, a compiler must provide the row and column location of errors, and the lexer can't detect errors so this information must be passed to the parser. For example go compiler uses below as AST node.

type Pos int

// All node types implement the Node interface.
type Node interface {
	Pos() token.Pos // position of first character belonging to the node
	End() token.Pos // position of first character immediately after the node
}

Long version of my question

AFAIK, the Compilation frontend works like this: stream of chars -> streams of tokens -> AST. In each level "some things" are abstracted. In my eyes, a Token should not be part of AST Node

Should a token be part of an AST Node
Could you give examples of what PLs choose which way

答案1

得分: 0

AST的确切性质是编译器（或解析库）的实现细节，不同的AST实现将具有不同的字段，即使是同一语言的不同AST实现也是如此。

几乎总是存在一些机制来从AST节点中提取源代码位置信息，用于错误消息和嵌入到编译输出中的调试信息。可以通过为每个AST节点类型添加位置对象（或对象）来实现。或者，位置信息可以以某种方式保存在可以从AST节点中发现的标记对象中。或者采用这些策略的混合，并提供一个Location getter方法。

我想不出要坚持或禁止AST中的标记对象的好理由。一个AST节点引用一个单标记文字或标识符可能会保存在一个标记对象中。为什么不呢？

英文:

The exact nature of the AST is an implementation detail of the compiler (or parsing library) and different AST implementations will have different fields, even different AST implementations for the same language.

It is almost always the case that there will be some mechanism for extracting source location information from an AST node, both for error messages and for debugging information embedded in the compiled output. That could be done by adding a location object (or objects) to every AST node type. Alternatively, the location information could be held in token objects somehow discoverable from the AST node. Or a mix of these strategies and provide a Location getter method.

I can't think of a good reason to either insist on or prohibit token objects from an AST. An AST node referring to a single-token literal or identifier might well be held in a Token object. Why not?

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

应该将令牌作为AST节点的一部分吗？

问题

我的回答：

Some background why I ask this question.

Long version of my question

答案1

Golang echo.JSON turns Zero time.Time as Empty String but I Expect to be Returned as "0001-01-01 00:00:00"

如何在 defer 块中安全地关闭一个通道？

如何从MySQL填充一个结构类型的映射表

如何解析没有定义结构的 JSON 数据？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论