2023年6月8日 12:43:32go评论125阅读模式

英文:

How can left-factoring eliminate backtracking while there are non-terminals?

问题

I'm trying to implement a parser generator (for fun), following the book Engineering a Compiler. The book suggests left-factoring for eliminating backtracking:

A -&gt; B m | B n | m
B -&gt; m

Becomes:

A -&gt; B A&#39; | m
B -&gt; m
A&#39; -&gt; m | n

But now, B starts with m, A' starts with m, and A has a production alternative that also starts with m, hence the start sets for A, B and A' have the common element m, disqualifying the grammar for a LL(1) parser.

Is the grammar above inherently a non-backtrack free grammar and requires a more powerful parser such as a LR(1) parser? Or am I applying the left-factoring wrong? or there needs to be a prior step to left factoring so that no rule starts with a non-terminal (is it even possible?)

I feel the book is missing some description, in lot of places and this is one example.

In the above grammar, I simplified the issue I'm facing with the following toy grammar:

start           -&gt; fn_declaration start | fn_declaration |
fn_call         -&gt; ID ( args ) ;
args            -&gt; args , arg | arg |
arg             -&gt; STR | INT | ID
fn_declaration  -&gt; FN ID ( params ) { statements }
params          -&gt; param , params | param |
param           -&gt; ID ID
statements      -&gt; statement , statements | statement |
statement       -&gt; declaration | assignment | fn_call | ret
declaration     -&gt; ID ID ;
assignment      -&gt; ID = expressions ;
expressions     -&gt; terms + expressions | terms - expressions | terms
terms           -&gt; factor * terms | factor / terms | factor
factor          -&gt; ( expressions ) | INT | ID
ret             -&gt; RETURN expressions ;

To which I'm applying these steps, in this order:

Eliminate epsilons.
Eliminate left recursion.
Eliminate backtracking (by left-factoring).
Calculate first and follow sets.

All steps do their job, but I eventually have these rules:

statement =&gt; alt=[ declaration | assignment | ret | Id ( statement_p0 ], first=[Id, return, ;], follow=[,, }]
statement_p0 =&gt; alts=[ args ) ; | ) ;], first=[), Int, Str, Id], follow=[,, }]

declaration and assignment share ID as their first token kind, and fail the is-backtrack-free test.

英文:

I'm trying to implement a parser generator (for fun), following the book Engineering a Compiler. The book suggests left-factoring for eliminating backtracking:

A -&gt; B m | B n | m
B -&gt; m

Becomes:

A -&gt; B A&#39; | m
B -&gt; m
A&#39; -&gt; m | n

I feel the book is missing some description, in lot of places and this is one example.

In the above grammar, I simplified the issue I'm facing with the following toy grammar:

start           -&gt; fn_declaration start | fn_declaration |
fn_call         -&gt; ID ( args ) ;
args            -&gt; args , arg | arg |
arg             -&gt; STR | INT | ID
fn_declaration  -&gt; FN ID ( params ) { statements }
params          -&gt; param , params | param |
param           -&gt; ID ID
statements      -&gt; statement , statements | statement |
statement       -&gt; declaration | assignment | fn_call | ret
declaration     -&gt; ID ID ;
assignment      -&gt; ID = expressions ;
expressions     -&gt; terms + expressions | terms - expressions | terms
terms           -&gt; factor * terms | factor / terms | factor
factor          -&gt; ( expressions ) | INT | ID
ret             -&gt; RETURN expressions ;

To which I'm applying these steps, in this order:

Eliminate epsilons.
Eliminate left recursion.
Eliminate backtracking (by left-factoring).
Calculate first and follow sets.

All steps do their job, but I eventually have these rules:

statement =&gt; alt=[ declaration | assignment | ret | Id ( statement_p0 ], first=[Id, return, ;], follow=[,, }]
statement_p0 =&gt; alts=[ args ) ; | ) ;], first=[), Int, Str, Id], follow=[,, }]

# ...rest of the rules removed for brevity

declaration and assignment share ID as their first token kind, and fail the is-backtrack-free test.

答案1

得分: 0

为了左因子化语法，您需要确保每个用于某个非终结符的规则的RHS的起始符号与其他符号具有不相交的FIRST集。最简单的方法是首先左展开语法——扩展RHS上的每个初始非终结符号，以便每个规则的RHS都以终结符开头。

这样做后，您会得到以下结果：

A -> m m | m n | m
B -> m

现在您可以将A规则左因子化为：

A -> m A'
A' -> m | n | ε

英文:

In order to left-factor the grammar, you need to ensure that every symbol that begins the RHS of a rule for some non-terminal has a disjoint FIRST set from every other symbol. The easiest way to do that is to first left-expand the grammar -- expand every initial non-terminal symbol on a RHS so that the RHS of every rule begins with a terminal.

When you do that you get

A -&gt; m m | m n | m
B -&gt; m

so now you left-factor the A rules to

A -&gt; m A&#39;
A&#39; -&gt; m | n | ε

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

左因子化如何消除回溯，同时存在非终结符？

问题

答案1

你不能在sh中的多行命令中添加注释。

Nom 7 doesn't backtrack when `alt` branch fails to parse

解析包含变化键的嵌套JSON对象

Go: time.Parse()问题

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论