英文:
Antlr4: how to avoid excessive semantic predicates?
问题
这是我词法分析器规则的开头:
F_TEXT_START
: {! matchingFText}? 'f"' {matchingFText = true;}
;
F_TEXT_PH_ESCAPE
: {matchingFText && ! matchingFTextPh}? '{=/'
;
F_TEXT_PH_START
: {matchingFText && ! matchingFTextPh}? '{=' {matchingFTextPh = true;}
;
F_TEXT_PH_END
: {matchingFText && matchingFTextPh}? '}' {matchingFTextPh = false;}
;
F_TEXT_CHAR
: {matchingFText && ! matchingFTextPh}? (~('"') | '{')+ | '""' | '{' ~'=')
;
F_TEXT_END
: {matchingFText && ! matchingFTextPh}? '"' {matchingFText = false;}
;
IF
: {! matchingFText || matchingFTextPh}? 'if'
;
ELIF
: {! matchingFText || matchingFTextPh}? 'elif'
;
// Lots of other keywords
fragment LETTER
: ('A' .. 'Z' | 'a' .. 'z' | '_')
;
VARIABLE
: {! matchingFText || matchingFTextPh}? LETTER (LETTER | DIGIT)*
;
我所做的是将格式化文本放在标记之前,不仅仅像普通文本标记那样,而是将其添加到解析树中,以便在解析时(仅使用parser.start()
)能够检测到是否存在错误。因此,格式化文本以f"
开头,以"
结尾,任何"
必须被替换为""
,并且可以包含以{=
开头,以}
结尾的占位符,但如果要实际写{=
,则必须将其替换为{=/
。
问题是,在正常的格式化文本内容(非占位符)中,词法分析器开始匹配不仅仅是F_TEXT_CHAR
,还有其他词法规则,比如变量。我所做的似乎相当愚蠢,我为每个其他规则都放置了语义断言,以避免它们在格式化文本内容中被匹配(但仍会在占位符中被匹配)。
难道没有更好的方法吗?
英文:
Here is the beginning of my lexer rules:
F_TEXT_START
: {! matchingFText}? 'f"' {matchingFText = true;}
;
F_TEXT_PH_ESCAPE
: {matchingFText && ! matchingFTextPh}? '{=/'
;
F_TEXT_PH_START
: {matchingFText && ! matchingFTextPh}? '{=' {matchingFTextPh = true;}
;
F_TEXT_PH_END
: {matchingFText && matchingFTextPh}? '}' {matchingFTextPh = false;}
;
F_TEXT_CHAR
: {matchingFText && ! matchingFTextPh}? (~('"' | '{')+ | '""' | '{' ~'=')
;
F_TEXT_END
: {matchingFText && ! matchingFTextPh}? '"' {matchingFText = false;}
;
IF
: {! matchingFText || matchingFTextPh}? 'if'
;
ELIF
: {! matchingFText || matchingFTextPh}? 'elif'
;
// Lots of other keywords
fragment LETTER
: ('A' .. 'Z' | 'a' .. 'z' | '_')
;
VARIABLE
: {! matchingFText || matchingFTextPh}? LETTER (LETTER | DIGIT)*
;
What I am doing is putting my formatted text not just like a normal text token but with a f before, but I add it to my parse tree, to be able to tell if there are errors while parsing (with just parser.start()
). So a formatted text starts with f"
, finishes with a "
, any "
must be replaced by ""
, and can contain placeholders starting with {=
and finishing with }
but if you want to actually write {=
, you'll have to replace it by {=/
.
The problem is that in a normal formatted text content (not placeholder), the lexer started to mach not only F_TEXT_CHAR
but other lexer rules too, like variables. What I did seems pretty dumb, I just put semantic predicates for every other rule to avoid them to be matched in a formatted text's content (but still in a placeholder).
Isn't there a better way ?
答案1
得分: 2
我会为您进行翻译,以下是翻译好的内容:
我会为此使用词法模式。要使用词法模式,您需要定义单独的词法分析器和语法分析器语法。以下是一个快速示例:
```antlr
词法分析器语法 TestLexer;
F_TEXT_START
: 'f"' -> pushMode(F_TEXT)
;
VARIABLE
: LETTER (LETTER | DIGIT)*
;
F_TEXT_PH_ESCAPE
: '{=/''
;
F_TEXT_PH_END
: '}' -> popMode
;
SPACES
: [ \t\r\n]+ -> skip
;
fragment LETTER
: [a-zA-Z_]
;
fragment DIGIT
: [0-9]
;
mode F_TEXT;
F_TEXT_CHAR
: ~["{]+ | '""' | '{' ~'='
;
F_TEXT_PH_START
: '{=' -> pushMode(DEFAULT_MODE)
;
F_TEXT_END
: '""' -> popMode
;
在您的语法分析器中如下使用词法分析器:
语法分析器语法 TestParser;
options {
tokenVocab=TestLexer;
}
// ...
如果您现在对输入字符串 f"mu {=mu}" mu
进行词法分析,您将获得以下标记:
F_TEXT_START `f"`
F_TEXT_CHAR `mu `
F_TEXT_PH_START `={`
VARIABLE `mu`
F_TEXT_PH_END `}`
F_TEXT_END `"`
VARIABLE `mu`
<details>
<summary>英文:</summary>
I'd use a lexical mode for this. To use lexical modes, you'll have to define separate lexer- and parser grammars. Here's a quick demo:
lexer grammar TestLexer;
F_TEXT_START
: 'f"' -> pushMode(F_TEXT)
;
VARIABLE
: LETTER (LETTER | DIGIT)*
;
F_TEXT_PH_ESCAPE
: '{=/'
;
F_TEXT_PH_END
: '}' -> popMode
;
SPACES
: [ \t\r\n]+ -> skip
;
fragment LETTER
: [a-zA-Z_]
;
fragment DIGIT
: [0-9]
;
mode F_TEXT;
F_TEXT_CHAR
: ~["{]+ | '""' | '{' ~'='
;
F_TEXT_PH_START
: '{=' -> pushMode(DEFAULT_MODE)
;
F_TEXT_END
: '"' -> popMode
;
Use the lexer in your parser like this:
parser grammar TestParser;
options {
tokenVocab=TestLexer;
}
// ...
If you now tokenise the input `f"mu {=mu}" mu`, you'd get the following tokens:
F_TEXT_START f"
F_TEXT_CHAR mu
F_TEXT_PH_START {=
VARIABLE mu
F_TEXT_PH_END }
F_TEXT_END "
VARIABLE mu
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论