英文:
Antlr4: how to avoid excessive semantic predicates?
问题
这是我词法分析器规则的开头:
F_TEXT_START
: {! matchingFText}? 'f"' {matchingFText = true;}
;
F_TEXT_PH_ESCAPE
: {matchingFText && ! matchingFTextPh}? '{=/'
;
F_TEXT_PH_START
: {matchingFText && ! matchingFTextPh}? '{=' {matchingFTextPh = true;}
;
F_TEXT_PH_END
: {matchingFText && matchingFTextPh}? '}' {matchingFTextPh = false;}
;
F_TEXT_CHAR
: {matchingFText && ! matchingFTextPh}? (~('"') | '{')+ | '""' | '{' ~'=')
;
F_TEXT_END
: {matchingFText && ! matchingFTextPh}? '"' {matchingFText = false;}
;
IF
: {! matchingFText || matchingFTextPh}? 'if'
;
ELIF
: {! matchingFText || matchingFTextPh}? 'elif'
;
// Lots of other keywords
fragment LETTER
: ('A' .. 'Z' | 'a' .. 'z' | '_')
;
VARIABLE
: {! matchingFText || matchingFTextPh}? LETTER (LETTER | DIGIT)*
;
我所做的是将格式化文本放在标记之前,不仅仅像普通文本标记那样,而是将其添加到解析树中,以便在解析时(仅使用parser.start())能够检测到是否存在错误。因此,格式化文本以f"开头,以"结尾,任何"必须被替换为"",并且可以包含以{=开头,以}结尾的占位符,但如果要实际写{=,则必须将其替换为{=/。
问题是,在正常的格式化文本内容(非占位符)中,词法分析器开始匹配不仅仅是F_TEXT_CHAR,还有其他词法规则,比如变量。我所做的似乎相当愚蠢,我为每个其他规则都放置了语义断言,以避免它们在格式化文本内容中被匹配(但仍会在占位符中被匹配)。
难道没有更好的方法吗?
英文:
Here is the beginning of my lexer rules:
F_TEXT_START
: {! matchingFText}? 'f"' {matchingFText = true;}
;
F_TEXT_PH_ESCAPE
: {matchingFText && ! matchingFTextPh}? '{=/'
;
F_TEXT_PH_START
: {matchingFText && ! matchingFTextPh}? '{=' {matchingFTextPh = true;}
;
F_TEXT_PH_END
: {matchingFText && matchingFTextPh}? '}' {matchingFTextPh = false;}
;
F_TEXT_CHAR
: {matchingFText && ! matchingFTextPh}? (~('"' | '{')+ | '""' | '{' ~'=')
;
F_TEXT_END
: {matchingFText && ! matchingFTextPh}? '"' {matchingFText = false;}
;
IF
: {! matchingFText || matchingFTextPh}? 'if'
;
ELIF
: {! matchingFText || matchingFTextPh}? 'elif'
;
// Lots of other keywords
fragment LETTER
: ('A' .. 'Z' | 'a' .. 'z' | '_')
;
VARIABLE
: {! matchingFText || matchingFTextPh}? LETTER (LETTER | DIGIT)*
;
What I am doing is putting my formatted text not just like a normal text token but with a f before, but I add it to my parse tree, to be able to tell if there are errors while parsing (with just parser.start()). So a formatted text starts with f", finishes with a ", any " must be replaced by "", and can contain placeholders starting with {= and finishing with } but if you want to actually write {=, you'll have to replace it by {=/.
The problem is that in a normal formatted text content (not placeholder), the lexer started to mach not only F_TEXT_CHAR but other lexer rules too, like variables. What I did seems pretty dumb, I just put semantic predicates for every other rule to avoid them to be matched in a formatted text's content (but still in a placeholder).
Isn't there a better way ?
答案1
得分: 2
我会为您进行翻译,以下是翻译好的内容:
我会为此使用词法模式。要使用词法模式,您需要定义单独的词法分析器和语法分析器语法。以下是一个快速示例:
```antlr
词法分析器语法 TestLexer;
F_TEXT_START
: 'f"' -> pushMode(F_TEXT)
;
VARIABLE
: LETTER (LETTER | DIGIT)*
;
F_TEXT_PH_ESCAPE
: '{=/''
;
F_TEXT_PH_END
: '}' -> popMode
;
SPACES
: [ \t\r\n]+ -> skip
;
fragment LETTER
: [a-zA-Z_]
;
fragment DIGIT
: [0-9]
;
mode F_TEXT;
F_TEXT_CHAR
: ~["{]+ | '""' | '{' ~'='
;
F_TEXT_PH_START
: '{=' -> pushMode(DEFAULT_MODE)
;
F_TEXT_END
: '""' -> popMode
;
在您的语法分析器中如下使用词法分析器:
语法分析器语法 TestParser;
options {
tokenVocab=TestLexer;
}
// ...
如果您现在对输入字符串 f"mu {=mu}" mu 进行词法分析,您将获得以下标记:
F_TEXT_START `f"`
F_TEXT_CHAR `mu `
F_TEXT_PH_START `={`
VARIABLE `mu`
F_TEXT_PH_END `}`
F_TEXT_END `"`
VARIABLE `mu`
<details>
<summary>英文:</summary>
I'd use a lexical mode for this. To use lexical modes, you'll have to define separate lexer- and parser grammars. Here's a quick demo:
lexer grammar TestLexer;
F_TEXT_START
: 'f"' -> pushMode(F_TEXT)
;
VARIABLE
: LETTER (LETTER | DIGIT)*
;
F_TEXT_PH_ESCAPE
: '{=/'
;
F_TEXT_PH_END
: '}' -> popMode
;
SPACES
: [ \t\r\n]+ -> skip
;
fragment LETTER
: [a-zA-Z_]
;
fragment DIGIT
: [0-9]
;
mode F_TEXT;
F_TEXT_CHAR
: ~["{]+ | '""' | '{' ~'='
;
F_TEXT_PH_START
: '{=' -> pushMode(DEFAULT_MODE)
;
F_TEXT_END
: '"' -> popMode
;
Use the lexer in your parser like this:
parser grammar TestParser;
options {
tokenVocab=TestLexer;
}
// ...
If you now tokenise the input `f"mu {=mu}" mu`, you'd get the following tokens:
F_TEXT_START f"
F_TEXT_CHAR mu
F_TEXT_PH_START {=
VARIABLE mu
F_TEXT_PH_END }
F_TEXT_END "
VARIABLE mu
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论