ANTLR4:如何覆盖词法分析器子规则/片段中的文本

huangapple go评论74阅读模式
英文:

ANTLR4: How to override text in lexer subrule/fragment

问题

以下是您要翻译的内容:

"The syntax I'm trying to parse includes a continuation indicator in column 71.
Identifiers, literals, almost anything can be continued onto the next line.

Ideally, I would like to drop the characters which make up the continue token,
so that I'm left with only the identifier characters.
However, using the following lexer rules, the 'setText("")' in LINE_CONTINUATION
is ignored, thus polluting the final IDENTIFIER token.

IDENTIFIER 
	: 
	{getCharPositionInLine() < 71 }? IDENTIFIER_PART
	(
			{getCharPositionInLine() < 71 }? IDENTIFIER_PART  
		|	LINE_CONTINUATION 
	)*
;
fragment IDENTIFIER_PART: (LETTER|DIGIT|'_');
fragment DIGIT: [0-9];
fragment LETTER options { caseInsensitive=true; } : [A-Z];

//A continuation line is non-blank in column 72, followed by anything until EOL,
//then on the next line the characters starting after column position 15
LINE_CONTINUATION
	: 
	{getCharPositionInLine() == 71 }? 
	~[ ] 
	~[\r\n]* EOL
	({getCharPositionInLine() <= 15 }? [ ] )+  
	{setText("");} // 在此处设置文本为空字符串
; 

Is there any way of overriding the value of a subrule (or fragment) in the same way
that root rules can be overridden?

For example, there could be a list of identifiers defined as:

AAAAAAAAAAAA,BBBBBBBBBBB,CCCCCCCCCCCCCCCCC,DDDDDDDDDDD,EEEEEEEEEE,FFFF* Some comment
FFFF,GGGGGGGG

I'm trying to get tokens with text:

AAAAAAAAAAAA
BBBBBBBBBBB
CCCCCCCCCCCCCCCCC
DDDDDDDDDDD
EEEEEEEEEE
FFFFFFFF
GGGGGGGG

However, I get:

AAAAAAAAAAAA
BBBBBBBBBBB
CCCCCCCCCCCCCCCCC
DDDDDDDDDDD
EEEEEEEEEE
FFFF* Some comment\nFFFF
GGGGGGGG
英文:

The syntax I'm trying to parse includes a continuation indicator in column 71.
Identifiers, literals, almost anything can be continued onto the next line.

Ideally, I would like to drop the characters which make up the continue token,
so that I'm left with only the identifier characters.
However, using the following lexer rules, the 'setText("")' in LINE_CONTINUATION
is ignored, thus polluting the final IDENTIFIER token.

IDENTIFIER 
	: 
	{getCharPositionInLine() < 71 }? IDENTIFIER_PART
	(
			{getCharPositionInLine() < 71 }? IDENTIFIER_PART  
		|	LINE_CONTINUATION 
	)*
;
fragment IDENTIFIER_PART: (LETTER|DIGIT|'_');
fragment DIGIT: [0-9];
fragment LETTER options { caseInsensitive=true; } : [A-Z];

//A continuation line is non-blank in column 72, followed by anything until EOL,
//then on next line the characters starting after column position 15
LINE_CONTINUATION
	: 
	{getCharPositionInLine() == 71 }? 
	~[ ] 
	~[\r\n]* EOL
	({getCharPositionInLine() <= 15 }? [ ] )+  
	{setText("");}
; 

Is there anyway of overriding the value of a subrule (or fragment) in the same way
that root rules can be overridden?

For example, there could be a list of identifiers defined as:

AAAAAAAAAAAA,BBBBBBBBBBB,CCCCCCCCCCCCCCCCC,DDDDDDDDDDD,EEEEEEEEEE,FFFF* Some comment
FFFF,GGGGGGGG

I'm trying to get tokens with text:

AAAAAAAAAAAA
BBBBBBBBBBB
CCCCCCCCCCCCCCCCC
DDDDDDDDDDD
EEEEEEEEEE
FFFFFFFF
GGGGGGGG

However I get:

AAAAAAAAAAAA
BBBBBBBBBBB
CCCCCCCCCCCCCCCCC
DDDDDDDDDDD
EEEEEEEEEE
FFFF* Some comment\nFFFF
GGGGGGGG

答案1

得分: 0

这是不可能的。你必须在你的IDENTIFIER规则内部执行setText(…)。尝试类似这样的方式(未经测试):

IDENTIFIER
 : {getCharPositionInLine() < 71 }? IDENTIFIER_PART
   ( {getCharPositionInLine() < 71 }? IDENTIFIER_PART  
   | LINE_CONTINUATION 
   )*
   {
     String text = getText();
     setText(text.replaceAll(“\\S[^\r\n]*[\r\n]+[ ]{0,15}”, “”));
   }
;
英文:

That is not possible. You will have to do the setText(…) inside your IDENTIFIER rule. Try something like this (untested):

IDENTIFIER
 : {getCharPositionInLine() < 71 }? IDENTIFIER_PART
   ( {getCharPositionInLine() < 71 }? IDENTIFIER_PART  
   | LINE_CONTINUATION 
   )*
   {
     String text = getText();
     setText(text.replaceAll(“\\S[^\r\n]*[\r\n]+[ ]{0,15}”, “”));
   }
;

huangapple
  • 本文由 发表于 2023年7月11日 10:16:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/76658339.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定