2023年7月11日 10:16:04go评论74阅读模式

英文:

ANTLR4: How to override text in lexer subrule/fragment

问题

以下是您要翻译的内容：

"The syntax I'm trying to parse includes a continuation indicator in column 71.
Identifiers, literals, almost anything can be continued onto the next line.

Ideally, I would like to drop the characters which make up the continue token,
so that I'm left with only the identifier characters.
However, using the following lexer rules, the 'setText("")' in LINE_CONTINUATION
is ignored, thus polluting the final IDENTIFIER token.

IDENTIFIER 
	: 
	{getCharPositionInLine() &lt; 71 }? IDENTIFIER_PART
	(
			{getCharPositionInLine() &lt; 71 }? IDENTIFIER_PART  
		|	LINE_CONTINUATION 
	)*
;
fragment IDENTIFIER_PART: (LETTER|DIGIT|'_');
fragment DIGIT: [0-9];
fragment LETTER options { caseInsensitive=true; } : [A-Z];

//A continuation line is non-blank in column 72, followed by anything until EOL,
//then on the next line the characters starting after column position 15
LINE_CONTINUATION
	: 
	{getCharPositionInLine() == 71 }? 
	~[ ] 
	~[\r\n]* EOL
	({getCharPositionInLine() &lt;= 15 }? [ ] )+  
	{setText("");} // 在此处设置文本为空字符串
;

Is there any way of overriding the value of a subrule (or fragment) in the same way
that root rules can be overridden?

For example, there could be a list of identifiers defined as:

AAAAAAAAAAAA,BBBBBBBBBBB,CCCCCCCCCCCCCCCCC,DDDDDDDDDDD,EEEEEEEEEE,FFFF* Some comment
FFFF,GGGGGGGG

I'm trying to get tokens with text:

AAAAAAAAAAAA
BBBBBBBBBBB
CCCCCCCCCCCCCCCCC
DDDDDDDDDDD
EEEEEEEEEE
FFFFFFFF
GGGGGGGG

However, I get:

AAAAAAAAAAAA
BBBBBBBBBBB
CCCCCCCCCCCCCCCCC
DDDDDDDDDDD
EEEEEEEEEE
FFFF* Some comment\nFFFF
GGGGGGGG

英文:

The syntax I'm trying to parse includes a continuation indicator in column 71.
Identifiers, literals, almost anything can be continued onto the next line.

IDENTIFIER 
	: 
	{getCharPositionInLine() &lt; 71 }? IDENTIFIER_PART
	(
			{getCharPositionInLine() &lt; 71 }? IDENTIFIER_PART  
		|	LINE_CONTINUATION 
	)*
;
fragment IDENTIFIER_PART: (LETTER|DIGIT|&#39;_&#39;);
fragment DIGIT: [0-9];
fragment LETTER options { caseInsensitive=true; } : [A-Z];

//A continuation line is non-blank in column 72, followed by anything until EOL,
//then on next line the characters starting after column position 15
LINE_CONTINUATION
	: 
	{getCharPositionInLine() == 71 }? 
	~[ ] 
	~[\r\n]* EOL
	({getCharPositionInLine() &lt;= 15 }? [ ] )+  
	{setText(&quot;&quot;);}
;

Is there anyway of overriding the value of a subrule (or fragment) in the same way
that root rules can be overridden?

For example, there could be a list of identifiers defined as:

AAAAAAAAAAAA,BBBBBBBBBBB,CCCCCCCCCCCCCCCCC,DDDDDDDDDDD,EEEEEEEEEE,FFFF* Some comment
FFFF,GGGGGGGG

I'm trying to get tokens with text:

AAAAAAAAAAAA
BBBBBBBBBBB
CCCCCCCCCCCCCCCCC
DDDDDDDDDDD
EEEEEEEEEE
FFFFFFFF
GGGGGGGG

However I get:

AAAAAAAAAAAA
BBBBBBBBBBB
CCCCCCCCCCCCCCCCC
DDDDDDDDDDD
EEEEEEEEEE
FFFF* Some comment\nFFFF
GGGGGGGG

答案1

得分: 0

这是不可能的。你必须在你的IDENTIFIER规则内部执行setText(…)。尝试类似这样的方式（未经测试）：

IDENTIFIER
 : {getCharPositionInLine() &lt; 71 }? IDENTIFIER_PART
   ( {getCharPositionInLine() &lt; 71 }? IDENTIFIER_PART  
   | LINE_CONTINUATION 
   )*
   {
     String text = getText();
     setText(text.replaceAll(“\\S[^\r\n]*[\r\n]+[ ]{0,15}”, “”));
   }
;

英文:

That is not possible. You will have to do the setText(…) inside your IDENTIFIER rule. Try something like this (untested):

IDENTIFIER
 : {getCharPositionInLine() &lt; 71 }? IDENTIFIER_PART
   ( {getCharPositionInLine() &lt; 71 }? IDENTIFIER_PART  
   | LINE_CONTINUATION 
   )*
   {
     String text = getText();
     setText(text.replaceAll(“\\S[^\r\n]*[\r\n]+[ ]{0,15}”, “”));
   }
;

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

ANTLR4：如何覆盖词法分析器子规则/片段中的文本

问题

答案1

“Import ‘antlr4’ could not be resolved.”

如何从Java中的ANTLR监听器上下文获取行号

How to fix '"Character/_input does not exist in the current context" in Antlr with C# as target language?

如何单独指定由 Antlr 生成的 Java 文件路径和 tokens 文件路径？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论