英文:
How to allow lexer to parse specific code parts from java?
问题
我目前正在使用antlr4创建一个编译器,它应该允许解析Java代码。
如何允许:
public void =(Integer value) => java { this.value = value; }
其中位于java { }之间的代码不被antlr解析,但应该在我的解析器中有一个访问者。
目前我有
但这显然不起作用,.*?会解析整个文件。
请不要回答“使用引号”,因为这不会是我的解决方案,因为我想允许Java代码高亮显示。
英文:
I am currently creating a compiler with antlr4 which should allow java code to be parsed.
How do i allow:
public void =(Integer value) => java { this.value = value; }
that the code between java { } is not being parsed by antlr, but should have a visitor in my parser.
Currently i have
javaStatementBody: KWJAVA LCURLY .*? RCURLY
but this obviously does not work and .*? parses the whole file.
Please do not answer with "use quotes", thats not gonna be my solution, because i want to allow java code highlighting.
答案1
得分: 2
你可以创建单独的词法分析器和语法分析器语法,以便可以使用词法模式。每当词法分析器“看到”输入java {
时,它会切换到JAVA_MODE
。而在Java模式下,你会对注释、字符串和字符字面值进行标记化。此外,在此模式下,如果遇到{
,你会推入相同的JAVA_MODE
,以便词法分析器知道它是嵌套的一次。当遇到}
时,你会从堆栈中弹出一个模式(结果要么回到默认模式,要么保持在Java模式但深度减少一级)。
一个快速的演示:
IslandLexer.g4
lexer grammar IslandLexer;
JAVA_START
: 'java' SPACES '{' -> pushMode(JAVA_MODE)
;
OTHER
: .
;
fragment SPACES : [ \t\r\n]+;
mode JAVA_MODE;
JAVA_CHAR : '\'' ( ~['\r\n] | '\\' [tbnrf'\] ) '\'';
JAVA_STRING : '"' ( ~["\r\n] | '\\' [tbnrf"\\] )* '"';
JAVA_LINE_COMMENT : '//' ~[\r\n]*;
JAVA_BLOCK_COMMENT : '/*' .*? '*/';
JAVA_OPEN_BRACE : '{' -> pushMode(JAVA_MODE);
JAVA_CLOSE_BRACE : '}' -> popMode;
JAVA_OTHER : ~[{}];
IslandParser.g4
parser grammar IslandParser;
options { tokenVocab=IslandLexer; }
parse
: unit* EOF
;
unit
: base_language
| java_language
;
base_language
: OTHER+
;
java_language
: JAVA_START java_atom+
;
java_atom
: JAVA_CHAR
| JAVA_STRING
| JAVA_LINE_COMMENT
| JAVA_BLOCK_COMMENT
| JAVA_OPEN_BRACE
| JAVA_CLOSE_BRACE
| JAVA_OTHER
;
使用以下代码进行测试:
String source = "foo \n" +
"\n" +
"java { \n" +
" char foo() { \n" +
" /* a quote in a comment \\\" */ \n" +
" String s = \"java {...}\"; \n" +
" return '}'; \n" +
" }\n" +
"}\n" +
"\n" +
"bar";
IslandLexer lexer = new IslandLexer(CharStreams.fromString(source));
IslandParser parser = new IslandParser(new CommonTokenStream(lexer));
System.out.println(parser.parse().toStringTree(parser));
这将生成以下的解析树:
英文:
You could create separate lexer and parser grammars so that you can use lexical modes. Whenever the lexer "sees" the input java {
, it moves to the JAVA_MODE
. And when in the Java mode, you tokenise comments, string- and char literals. Also when in this mode, you encounter a {
, you push the same JAVA_MODE
so that the lexer knows it's nested once. And when you encounter a }
, you pop a mode from the stack (resulting in either going back to the default mode, or staying in the Java mode but one level less deep).
A quick demo:
IslandLexer.g4
lexer grammar IslandLexer;
JAVA_START
: 'java' SPACES '{' -> pushMode(JAVA_MODE)
;
OTHER
: .
;
fragment SPACES : [ \t\r\n]+;
mode JAVA_MODE;
JAVA_CHAR : '\'' ( ~[\\'\r\n] | '\\' [tbnrf'\\] ) '\'';
JAVA_STRING : '"' ( ~[\\"\r\n] | '\\' [tbnrf"\\] )* '"';
JAVA_LINE_COMMENT : '//' ~[\r\n]*;
JAVA_BLOCK_COMMENT : '/*' .*? '*/';
JAVA_OPEN_BRACE : '{' -> pushMode(JAVA_MODE);
JAVA_CLOSE_BRACE : '}' -> popMode;
JAVA_OTHER : ~[{}];
IslandParser.g4
parser grammar IslandParser;
options { tokenVocab=IslandLexer; }
parse
: unit* EOF
;
unit
: base_language
| java_janguage
;
base_language
: OTHER+
;
java_janguage
: JAVA_START java_atom+
;
java_atom
: JAVA_CHAR
| JAVA_STRING
| JAVA_LINE_COMMENT
| JAVA_BLOCK_COMMENT
| JAVA_OPEN_BRACE
| JAVA_CLOSE_BRACE
| JAVA_OTHER
;
Test it with the following code:
String source = "foo \n" +
"\n" +
"java { \n" +
" char foo() { \n" +
" /* a quote in a comment \\\" */ \n" +
" String s = \"java {...}\"; \n" +
" return '}'; \n" +
" }\n" +
"}\n" +
"\n" +
"bar";
IslandLexer lexer = new IslandLexer(CharStreams.fromString(source));
IslandParser parser = new IslandParser(new CommonTokenStream(lexer));
System.out.println(parser.parse().toStringTree(parser));
which is the following parse tree:
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论