英文:
ANTLR4: How to override text in lexer subrule/fragment
问题
以下是您要翻译的内容:
"The syntax I'm trying to parse includes a continuation indicator in column 71.
Identifiers, literals, almost anything can be continued onto the next line.
Ideally, I would like to drop the characters which make up the continue token,
so that I'm left with only the identifier characters.
However, using the following lexer rules, the 'setText("")' in LINE_CONTINUATION
is ignored, thus polluting the final IDENTIFIER token.
IDENTIFIER
:
{getCharPositionInLine() < 71 }? IDENTIFIER_PART
(
{getCharPositionInLine() < 71 }? IDENTIFIER_PART
| LINE_CONTINUATION
)*
;
fragment IDENTIFIER_PART: (LETTER|DIGIT|'_');
fragment DIGIT: [0-9];
fragment LETTER options { caseInsensitive=true; } : [A-Z];
//A continuation line is non-blank in column 72, followed by anything until EOL,
//then on the next line the characters starting after column position 15
LINE_CONTINUATION
:
{getCharPositionInLine() == 71 }?
~[ ]
~[\r\n]* EOL
({getCharPositionInLine() <= 15 }? [ ] )+
{setText("");} // 在此处设置文本为空字符串
;
Is there any way of overriding the value of a subrule (or fragment) in the same way
that root rules can be overridden?
For example, there could be a list of identifiers defined as:
AAAAAAAAAAAA,BBBBBBBBBBB,CCCCCCCCCCCCCCCCC,DDDDDDDDDDD,EEEEEEEEEE,FFFF* Some comment
FFFF,GGGGGGGG
I'm trying to get tokens with text:
AAAAAAAAAAAA
BBBBBBBBBBB
CCCCCCCCCCCCCCCCC
DDDDDDDDDDD
EEEEEEEEEE
FFFFFFFF
GGGGGGGG
However, I get:
AAAAAAAAAAAA
BBBBBBBBBBB
CCCCCCCCCCCCCCCCC
DDDDDDDDDDD
EEEEEEEEEE
FFFF* Some comment\nFFFF
GGGGGGGG
英文:
The syntax I'm trying to parse includes a continuation indicator in column 71.
Identifiers, literals, almost anything can be continued onto the next line.
Ideally, I would like to drop the characters which make up the continue token,
so that I'm left with only the identifier characters.
However, using the following lexer rules, the 'setText("")' in LINE_CONTINUATION
is ignored, thus polluting the final IDENTIFIER token.
IDENTIFIER
:
{getCharPositionInLine() < 71 }? IDENTIFIER_PART
(
{getCharPositionInLine() < 71 }? IDENTIFIER_PART
| LINE_CONTINUATION
)*
;
fragment IDENTIFIER_PART: (LETTER|DIGIT|'_');
fragment DIGIT: [0-9];
fragment LETTER options { caseInsensitive=true; } : [A-Z];
//A continuation line is non-blank in column 72, followed by anything until EOL,
//then on next line the characters starting after column position 15
LINE_CONTINUATION
:
{getCharPositionInLine() == 71 }?
~[ ]
~[\r\n]* EOL
({getCharPositionInLine() <= 15 }? [ ] )+
{setText("");}
;
Is there anyway of overriding the value of a subrule (or fragment) in the same way
that root rules can be overridden?
For example, there could be a list of identifiers defined as:
AAAAAAAAAAAA,BBBBBBBBBBB,CCCCCCCCCCCCCCCCC,DDDDDDDDDDD,EEEEEEEEEE,FFFF* Some comment
FFFF,GGGGGGGG
I'm trying to get tokens with text:
AAAAAAAAAAAA
BBBBBBBBBBB
CCCCCCCCCCCCCCCCC
DDDDDDDDDDD
EEEEEEEEEE
FFFFFFFF
GGGGGGGG
However I get:
AAAAAAAAAAAA
BBBBBBBBBBB
CCCCCCCCCCCCCCCCC
DDDDDDDDDDD
EEEEEEEEEE
FFFF* Some comment\nFFFF
GGGGGGGG
答案1
得分: 0
这是不可能的。你必须在你的IDENTIFIER
规则内部执行setText(…)
。尝试类似这样的方式(未经测试):
IDENTIFIER
: {getCharPositionInLine() < 71 }? IDENTIFIER_PART
( {getCharPositionInLine() < 71 }? IDENTIFIER_PART
| LINE_CONTINUATION
)*
{
String text = getText();
setText(text.replaceAll(“\\S[^\r\n]*[\r\n]+[ ]{0,15}”, “”));
}
;
英文:
That is not possible. You will have to do the setText(…)
inside your IDENTIFIER
rule. Try something like this (untested):
IDENTIFIER
: {getCharPositionInLine() < 71 }? IDENTIFIER_PART
( {getCharPositionInLine() < 71 }? IDENTIFIER_PART
| LINE_CONTINUATION
)*
{
String text = getText();
setText(text.replaceAll(“\\S[^\r\n]*[\r\n]+[ ]{0,15}”, “”));
}
;
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论