2020年8月30日 18:28:29go评论86阅读模式

英文:

Antlr4: how to avoid excessive semantic predicates?

问题

这是我词法分析器规则的开头：

F_TEXT_START
	: {! matchingFText}? 'f"' {matchingFText = true;}
	;

F_TEXT_PH_ESCAPE
	: {matchingFText && ! matchingFTextPh}? '{=/'
	;

F_TEXT_PH_START
	: {matchingFText && ! matchingFTextPh}? '{=' {matchingFTextPh = true;}
	;

F_TEXT_PH_END
	: {matchingFText && matchingFTextPh}? '}' {matchingFTextPh = false;}
	;

F_TEXT_CHAR
	: {matchingFText && ! matchingFTextPh}? (~('"') | '{')+ | '""' | '{' ~'=')
	;

F_TEXT_END
	: {matchingFText && ! matchingFTextPh}? '"' {matchingFText = false;}
	;


IF
	: {! matchingFText || matchingFTextPh}? 'if'
	;

ELIF
	: {! matchingFText || matchingFTextPh}? 'elif'
	;

// Lots of other keywords

fragment LETTER
	: ('A' .. 'Z' | 'a' .. 'z' | '_')
	;
	
VARIABLE
	: {! matchingFText || matchingFTextPh}? LETTER (LETTER | DIGIT)*
	;

我所做的是将格式化文本放在标记之前，不仅仅像普通文本标记那样，而是将其添加到解析树中，以便在解析时（仅使用parser.start()）能够检测到是否存在错误。因此，格式化文本以f"开头，以"结尾，任何"必须被替换为""，并且可以包含以{=开头，以}结尾的占位符，但如果要实际写{=，则必须将其替换为{=/。
问题是，在正常的格式化文本内容（非占位符）中，词法分析器开始匹配不仅仅是F_TEXT_CHAR，还有其他词法规则，比如变量。我所做的似乎相当愚蠢，我为每个其他规则都放置了语义断言，以避免它们在格式化文本内容中被匹配（但仍会在占位符中被匹配）。

难道没有更好的方法吗？

英文:

Here is the beginning of my lexer rules:

F_TEXT_START
	: {! matchingFText}? &#39;f&quot;&#39; {matchingFText = true;}
	;

F_TEXT_PH_ESCAPE
	: {matchingFText &amp;&amp; ! matchingFTextPh}? &#39;{=/&#39;
	;

F_TEXT_PH_START
	: {matchingFText &amp;&amp; ! matchingFTextPh}? &#39;{=&#39; {matchingFTextPh = true;}
	;

F_TEXT_PH_END
	: {matchingFText &amp;&amp; matchingFTextPh}? &#39;}&#39; {matchingFTextPh = false;}
	;

F_TEXT_CHAR
	: {matchingFText &amp;&amp; ! matchingFTextPh}? (~(&#39;&quot;&#39; | &#39;{&#39;)+ | &#39;&quot;&quot;&#39; | &#39;{&#39; ~&#39;=&#39;)
	;

F_TEXT_END
	: {matchingFText &amp;&amp; ! matchingFTextPh}? &#39;&quot;&#39; {matchingFText = false;}
	;


IF
	: {! matchingFText || matchingFTextPh}? &#39;if&#39;
	;

ELIF
	: {! matchingFText || matchingFTextPh}? &#39;elif&#39;
	;

// Lots of other keywords

fragment LETTER
	: (&#39;A&#39; .. &#39;Z&#39; | &#39;a&#39; .. &#39;z&#39; | &#39;_&#39;)
	;
	
VARIABLE
	: {! matchingFText || matchingFTextPh}? LETTER (LETTER | DIGIT)*
	;

What I am doing is putting my formatted text not just like a normal text token but with a f before, but I add it to my parse tree, to be able to tell if there are errors while parsing (with just parser.start()). So a formatted text starts with f", finishes with a ", any " must be replaced by "", and can contain placeholders starting with {= and finishing with } but if you want to actually write {=, you'll have to replace it by {=/.
The problem is that in a normal formatted text content (not placeholder), the lexer started to mach not only F_TEXT_CHAR but other lexer rules too, like variables. What I did seems pretty dumb, I just put semantic predicates for every other rule to avoid them to be matched in a formatted text's content (but still in a placeholder).

Isn't there a better way ?

答案1

得分: 2

我会为您进行翻译，以下是翻译好的内容：

我会为此使用词法模式。要使用词法模式，您需要定义单独的词法分析器和语法分析器语法。以下是一个快速示例：

```antlr
词法分析器语法 TestLexer;

F_TEXT_START
 : 'f"' -> pushMode(F_TEXT)
 ;

VARIABLE
 : LETTER (LETTER | DIGIT)*
 ;

F_TEXT_PH_ESCAPE
 : '{=/''
 ;

F_TEXT_PH_END
 : '}' -> popMode
 ;

SPACES
 : [ \t\r\n]+ -> skip
 ;

fragment LETTER
 : [a-zA-Z_]
 ;

fragment DIGIT
 : [0-9]
 ;

mode F_TEXT;

  F_TEXT_CHAR
   : ~["{]+ | '""' | '{' ~'='
   ;

  F_TEXT_PH_START
    : '{=' -> pushMode(DEFAULT_MODE)
    ;

  F_TEXT_END
   : '""' -> popMode
   ;

在您的语法分析器中如下使用词法分析器：

语法分析器语法 TestParser;

options {
  tokenVocab=TestLexer;
}

// ...

如果您现在对输入字符串 f"mu {=mu}" mu 进行词法分析，您将获得以下标记：

F_TEXT_START              `f"`
F_TEXT_CHAR               `mu `
F_TEXT_PH_START           `={`
VARIABLE                  `mu`
F_TEXT_PH_END             `}`
F_TEXT_END                `"`
VARIABLE                  `mu`


<details>
<summary>英文:</summary>

I&#39;d use a lexical mode for this. To use lexical modes, you&#39;ll have to define separate lexer- and parser grammars. Here&#39;s a quick demo:

lexer grammar TestLexer;

F_TEXT_START
: 'f"' -> pushMode(F_TEXT)
;

VARIABLE
: LETTER (LETTER | DIGIT)*
;

F_TEXT_PH_ESCAPE
: '{=/'
;

F_TEXT_PH_END
: '}' -> popMode
;

SPACES
: [ \t\r\n]+ -> skip
;

fragment LETTER
: [a-zA-Z_]
;

fragment DIGIT
: [0-9]
;

mode F_TEXT;

F_TEXT_CHAR
: ~["{]+ | '""' | '{' ~'='
;

F_TEXT_PH_START
: '{=' -> pushMode(DEFAULT_MODE)
;

F_TEXT_END
: '"' -> popMode
;


Use the lexer in your parser like this:

parser grammar TestParser;

options {
tokenVocab=TestLexer;
}

// ...


If you now tokenise the input `f&quot;mu {=mu}&quot; mu`, you&#39;d get the following tokens:

F_TEXT_START f"
F_TEXT_CHAR mu
F_TEXT_PH_START {=
VARIABLE mu
F_TEXT_PH_END }
F_TEXT_END "
VARIABLE mu


</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Antlr4：如何避免过多的语义谓词？

问题

答案1

使用rowid删除SQLite行不起作用

服务和清单文件

在指定的索引处添加单个元素。

尝试计算创建的对象数量

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论