左因子化如何消除回溯,同时存在非终结符?

huangapple go评论113阅读模式
英文:

How can left-factoring eliminate backtracking while there are non-terminals?

问题

I'm trying to implement a parser generator (for fun), following the book Engineering a Compiler. The book suggests left-factoring for eliminating backtracking:

A -> B m | B n | m
B -> m

Becomes:

A -> B A' | m
B -> m
A' -> m | n

But now, B starts with m, A' starts with m, and A has a production alternative that also starts with m, hence the start sets for A, B and A' have the common element m, disqualifying the grammar for a LL(1) parser.

Is the grammar above inherently a non-backtrack free grammar and requires a more powerful parser such as a LR(1) parser? Or am I applying the left-factoring wrong? or there needs to be a prior step to left factoring so that no rule starts with a non-terminal (is it even possible?)

I feel the book is missing some description, in lot of places and this is one example.


In the above grammar, I simplified the issue I'm facing with the following toy grammar:

start           -> fn_declaration start | fn_declaration |
fn_call         -> ID ( args ) ;
args            -> args , arg | arg |
arg             -> STR | INT | ID
fn_declaration  -> FN ID ( params ) { statements }
params          -> param , params | param |
param           -> ID ID
statements      -> statement , statements | statement |
statement       -> declaration | assignment | fn_call | ret
declaration     -> ID ID ;
assignment      -> ID = expressions ;
expressions     -> terms + expressions | terms - expressions | terms
terms           -> factor * terms | factor / terms | factor
factor          -> ( expressions ) | INT | ID
ret             -> RETURN expressions ;

To which I'm applying these steps, in this order:

  • Eliminate epsilons.
  • Eliminate left recursion.
  • Eliminate backtracking (by left-factoring).
  • Calculate first and follow sets.

All steps do their job, but I eventually have these rules:

statement => alt=[ declaration | assignment | ret | Id ( statement_p0 ], first=[Id, return, ;], follow=[,, }]
statement_p0 => alts=[ args ) ; | ) ;], first=[), Int, Str, Id], follow=[,, }]

declaration and assignment share ID as their first token kind, and fail the is-backtrack-free test.

英文:

I'm trying to implement a parser generator (for fun), following the book Engineering a Compiler. The book suggests left-factoring for eliminating backtracking:

A -> B m | B n | m
B -> m

Becomes:

A -> B A' | m
B -> m
A' -> m | n

But now, B starts with m, A' starts with m, and A has a production alternative that also starts with m, hence the start sets for A, B and A' have the common element m, disqualifying the grammar for a LL(1) parser.

Is the grammar above inherently a non-backtrack free grammar and requires a more powerful parser such as a LR(1) parser? Or am I applying the left-factoring wrong? or there needs to be a prior step to left factoring so that no rule starts with a non-terminal (is it even possible?)

I feel the book is missing some description, in lot of places and this is one example.


In the above grammar, I simplified the issue I'm facing with the following toy grammar:

start           -> fn_declaration start | fn_declaration |
fn_call         -> ID ( args ) ;
args            -> args , arg | arg |
arg             -> STR | INT | ID
fn_declaration  -> FN ID ( params ) { statements }
params          -> param , params | param |
param           -> ID ID
statements      -> statement , statements | statement |
statement       -> declaration | assignment | fn_call | ret
declaration     -> ID ID ;
assignment      -> ID = expressions ;
expressions     -> terms + expressions | terms - expressions | terms
terms           -> factor * terms | factor / terms | factor
factor          -> ( expressions ) | INT | ID
ret             -> RETURN expressions ;

To which I'm applying these steps, in this order:

  • Eliminate epsilons.
  • Eliminate left recursion.
  • Eliminate backtracking (by left-factoring).
  • Calculate first and follow sets.

All steps do their job, but I eventually have these rules:

statement => alt=[ declaration | assignment | ret | Id ( statement_p0 ], first=[Id, return, ;], follow=[,, }]
statement_p0 => alts=[ args ) ; | ) ;], first=[), Int, Str, Id], follow=[,, }]

# ...rest of the rules removed for brevity

declaration and assignment share ID as their first token kind, and fail the is-backtrack-free test.

答案1

得分: 0

为了左因子化语法,您需要确保每个用于某个非终结符的规则的RHS的起始符号与其他符号具有不相交的FIRST集。最简单的方法是首先左展开语法——扩展RHS上的每个初始非终结符号,以便每个规则的RHS都以终结符开头。

这样做后,您会得到以下结果:

A -> m m | m n | m
B -> m

现在您可以将A规则左因子化为:

A -> m A'
A' -> m | n | ε

英文:

In order to left-factor the grammar, you need to ensure that every symbol that begins the RHS of a rule for some non-terminal has a disjoint FIRST set from every other symbol. The easiest way to do that is to first left-expand the grammar -- expand every initial non-terminal symbol on a RHS so that the RHS of every rule begins with a terminal.

When you do that you get

A -> m m | m n | m
B -> m

so now you left-factor the A rules to

A -> m A'
A' -> m | n | ε

huangapple
  • 本文由 发表于 2023年6月8日 12:43:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/76428679.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定