Antlr4:如何避免过多的语义谓词?

huangapple go评论83阅读模式
英文:

Antlr4: how to avoid excessive semantic predicates?

问题

这是我词法分析器规则的开头:

F_TEXT_START
	: {! matchingFText}? 'f"' {matchingFText = true;}
	;

F_TEXT_PH_ESCAPE
	: {matchingFText && ! matchingFTextPh}? '{=/'
	;

F_TEXT_PH_START
	: {matchingFText && ! matchingFTextPh}? '{=' {matchingFTextPh = true;}
	;

F_TEXT_PH_END
	: {matchingFText && matchingFTextPh}? '}' {matchingFTextPh = false;}
	;

F_TEXT_CHAR
	: {matchingFText && ! matchingFTextPh}? (~('"') | '{')+ | '""' | '{' ~'=')
	;

F_TEXT_END
	: {matchingFText && ! matchingFTextPh}? '"' {matchingFText = false;}
	;


IF
	: {! matchingFText || matchingFTextPh}? 'if'
	;

ELIF
	: {! matchingFText || matchingFTextPh}? 'elif'
	;

// Lots of other keywords

fragment LETTER
	: ('A' .. 'Z' | 'a' .. 'z' | '_')
	;
	
VARIABLE
	: {! matchingFText || matchingFTextPh}? LETTER (LETTER | DIGIT)*
	;

我所做的是将格式化文本放在标记之前,不仅仅像普通文本标记那样,而是将其添加到解析树中,以便在解析时(仅使用parser.start())能够检测到是否存在错误。因此,格式化文本以f"开头,以"结尾,任何"必须被替换为"",并且可以包含以{=开头,以}结尾的占位符,但如果要实际写{=,则必须将其替换为{=/
问题是,在正常的格式化文本内容(非占位符)中,词法分析器开始匹配不仅仅是F_TEXT_CHAR,还有其他词法规则,比如变量。我所做的似乎相当愚蠢,我为每个其他规则都放置了语义断言,以避免它们在格式化文本内容中被匹配(但仍会在占位符中被匹配)。

难道没有更好的方法吗?

英文:

Here is the beginning of my lexer rules:

F_TEXT_START
	: {! matchingFText}? 'f"' {matchingFText = true;}
	;

F_TEXT_PH_ESCAPE
	: {matchingFText && ! matchingFTextPh}? '{=/'
	;

F_TEXT_PH_START
	: {matchingFText && ! matchingFTextPh}? '{=' {matchingFTextPh = true;}
	;

F_TEXT_PH_END
	: {matchingFText && matchingFTextPh}? '}' {matchingFTextPh = false;}
	;

F_TEXT_CHAR
	: {matchingFText && ! matchingFTextPh}? (~('"' | '{')+ | '""' | '{' ~'=')
	;

F_TEXT_END
	: {matchingFText && ! matchingFTextPh}? '"' {matchingFText = false;}
	;


IF
	: {! matchingFText || matchingFTextPh}? 'if'
	;

ELIF
	: {! matchingFText || matchingFTextPh}? 'elif'
	;

// Lots of other keywords

fragment LETTER
	: ('A' .. 'Z' | 'a' .. 'z' | '_')
	;
	
VARIABLE
	: {! matchingFText || matchingFTextPh}? LETTER (LETTER | DIGIT)*
	;

What I am doing is putting my formatted text not just like a normal text token but with a f before, but I add it to my parse tree, to be able to tell if there are errors while parsing (with just parser.start()). So a formatted text starts with f", finishes with a ", any " must be replaced by "", and can contain placeholders starting with {= and finishing with } but if you want to actually write {=, you'll have to replace it by {=/.
The problem is that in a normal formatted text content (not placeholder), the lexer started to mach not only F_TEXT_CHAR but other lexer rules too, like variables. What I did seems pretty dumb, I just put semantic predicates for every other rule to avoid them to be matched in a formatted text's content (but still in a placeholder).

Isn't there a better way ?

答案1

得分: 2

我会为您进行翻译,以下是翻译好的内容:

我会为此使用词法模式。要使用词法模式,您需要定义单独的词法分析器和语法分析器语法。以下是一个快速示例:

```antlr
词法分析器语法 TestLexer;

F_TEXT_START
 : 'f"' -> pushMode(F_TEXT)
 ;

VARIABLE
 : LETTER (LETTER | DIGIT)*
 ;

F_TEXT_PH_ESCAPE
 : '{=/''
 ;

F_TEXT_PH_END
 : '}' -> popMode
 ;

SPACES
 : [ \t\r\n]+ -> skip
 ;

fragment LETTER
 : [a-zA-Z_]
 ;

fragment DIGIT
 : [0-9]
 ;

mode F_TEXT;

  F_TEXT_CHAR
   : ~["{]+ | '""' | '{' ~'='
   ;

  F_TEXT_PH_START
    : '{=' -> pushMode(DEFAULT_MODE)
    ;

  F_TEXT_END
   : '""' -> popMode
   ;

在您的语法分析器中如下使用词法分析器:

语法分析器语法 TestParser;

options {
  tokenVocab=TestLexer;
}

// ...

如果您现在对输入字符串 f"mu {=mu}" mu 进行词法分析,您将获得以下标记:

F_TEXT_START              `f"`
F_TEXT_CHAR               `mu `
F_TEXT_PH_START           `={`
VARIABLE                  `mu`
F_TEXT_PH_END             `}`
F_TEXT_END                `"`
VARIABLE                  `mu`

<details>
<summary>英文:</summary>

I&#39;d use a lexical mode for this. To use lexical modes, you&#39;ll have to define separate lexer- and parser grammars. Here&#39;s a quick demo:

lexer grammar TestLexer;

F_TEXT_START
: 'f"' -> pushMode(F_TEXT)
;

VARIABLE
: LETTER (LETTER | DIGIT)*
;

F_TEXT_PH_ESCAPE
: '{=/'
;

F_TEXT_PH_END
: '}' -> popMode
;

SPACES
: [ \t\r\n]+ -> skip
;

fragment LETTER
: [a-zA-Z_]
;

fragment DIGIT
: [0-9]
;

mode F_TEXT;

F_TEXT_CHAR
: ~["{]+ | '""' | '{' ~'='
;

F_TEXT_PH_START
: '{=' -> pushMode(DEFAULT_MODE)
;

F_TEXT_END
: '"' -> popMode
;


Use the lexer in your parser like this:

parser grammar TestParser;

options {
tokenVocab=TestLexer;
}

// ...


If you now tokenise the input `f&quot;mu {=mu}&quot; mu`, you&#39;d get the following tokens:

F_TEXT_START f&quot;
F_TEXT_CHAR mu
F_TEXT_PH_START {=
VARIABLE mu
F_TEXT_PH_END }
F_TEXT_END &quot;
VARIABLE mu


</details>



huangapple
  • 本文由 发表于 2020年8月30日 18:28:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/63656467.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定