2020年8月5日 16:10:19go评论169阅读模式

英文:

How to allow lexer to parse specific code parts from java?

问题

我目前正在使用antlr4创建一个编译器，它应该允许解析Java代码。

如何允许：

public void =(Integer value) =&gt; java { this.value = value; }

其中位于java { }之间的代码不被antlr解析，但应该在我的解析器中有一个访问者。

目前我有

但这显然不起作用，.*?会解析整个文件。

请不要回答“使用引号”，因为这不会是我的解决方案，因为我想允许Java代码高亮显示。

英文:

I am currently creating a compiler with antlr4 which should allow java code to be parsed.

How do i allow:

public void =(Integer value) =&gt; java { this.value = value; }

that the code between java { } is not being parsed by antlr, but should have a visitor in my parser.

Currently i have

javaStatementBody: KWJAVA LCURLY .*? RCURLY

but this obviously does not work and .*? parses the whole file.

Please do not answer with "use quotes", thats not gonna be my solution, because i want to allow java code highlighting.

答案1

得分: 2

你可以创建单独的词法分析器和语法分析器语法，以便可以使用词法模式。每当词法分析器“看到”输入java {时，它会切换到JAVA_MODE。而在Java模式下，你会对注释、字符串和字符字面值进行标记化。此外，在此模式下，如果遇到{，你会推入相同的JAVA_MODE，以便词法分析器知道它是嵌套的一次。当遇到}时，你会从堆栈中弹出一个模式（结果要么回到默认模式，要么保持在Java模式但深度减少一级）。

一个快速的演示：

`IslandLexer.g4`

lexer grammar IslandLexer;

JAVA_START
 : 'java' SPACES '{' -> pushMode(JAVA_MODE)
 ;

OTHER
 : .
 ;

fragment SPACES : [ \t\r\n]+;

mode JAVA_MODE;

  JAVA_CHAR          : '\'' ( ~['\r\n] | '\\' [tbnrf'\] ) '\'';
  JAVA_STRING        : '"' ( ~["\r\n] | '\\' [tbnrf"\\] )* '"';
  JAVA_LINE_COMMENT  : '//' ~[\r\n]*;
  JAVA_BLOCK_COMMENT : '/*' .*? '*/';
  JAVA_OPEN_BRACE    : '{' -> pushMode(JAVA_MODE);
  JAVA_CLOSE_BRACE   : '}' -> popMode;
  JAVA_OTHER         : ~[{}];

`IslandParser.g4`

parser grammar IslandParser;

options { tokenVocab=IslandLexer; }

parse
 : unit* EOF
 ;

unit
 : base_language
 | java_language
 ;

base_language
 : OTHER+
 ;

java_language
 : JAVA_START java_atom+
 ;

java_atom
 : JAVA_CHAR
 | JAVA_STRING
 | JAVA_LINE_COMMENT
 | JAVA_BLOCK_COMMENT
 | JAVA_OPEN_BRACE
 | JAVA_CLOSE_BRACE
 | JAVA_OTHER
 ;

使用以下代码进行测试：

String source = "foo \n" +
        "\n" +
        "java { \n" +
        "  char foo() { \n" +
        "    /* a quote in a comment \\\" */ \n" +
        "    String s = \"java {...}\"; \n" +
        "    return '}'; \n" +
        "  }\n" +
        "}\n" +
        "\n" +
        "bar";

IslandLexer lexer = new IslandLexer(CharStreams.fromString(source));
IslandParser parser = new IslandParser(new CommonTokenStream(lexer));
System.out.println(parser.parse().toStringTree(parser));

这将生成以下的解析树：

英文:

You could create separate lexer and parser grammars so that you can use lexical modes. Whenever the lexer "sees" the input java {, it moves to the JAVA_MODE. And when in the Java mode, you tokenise comments, string- and char literals. Also when in this mode, you encounter a {, you push the same JAVA_MODE so that the lexer knows it's nested once. And when you encounter a }, you pop a mode from the stack (resulting in either going back to the default mode, or staying in the Java mode but one level less deep).

A quick demo:

`IslandLexer.g4`

lexer grammar IslandLexer;

JAVA_START
 : &#39;java&#39; SPACES &#39;{&#39; -&gt; pushMode(JAVA_MODE)
 ;

OTHER
 : .
 ;

fragment SPACES : [ \t\r\n]+;

mode JAVA_MODE;

  JAVA_CHAR          : &#39;\&#39;&#39; ( ~[\\&#39;\r\n] | &#39;\\&#39; [tbnrf&#39;\\] ) &#39;\&#39;&#39;;
  JAVA_STRING        : &#39;&quot;&#39; ( ~[\\&quot;\r\n] | &#39;\\&#39; [tbnrf&quot;\\] )* &#39;&quot;&#39;;
  JAVA_LINE_COMMENT  : &#39;//&#39; ~[\r\n]*;
  JAVA_BLOCK_COMMENT : &#39;/*&#39; .*? &#39;*/&#39;;
  JAVA_OPEN_BRACE    : &#39;{&#39; -&gt; pushMode(JAVA_MODE);
  JAVA_CLOSE_BRACE   : &#39;}&#39; -&gt; popMode;
  JAVA_OTHER         : ~[{}];

`IslandParser.g4`

parser grammar IslandParser;

options { tokenVocab=IslandLexer; }

parse
 : unit* EOF
 ;

unit
 : base_language
 | java_janguage
 ;

base_language
 : OTHER+
 ;

java_janguage
 : JAVA_START java_atom+
 ;

java_atom
 : JAVA_CHAR
 | JAVA_STRING
 | JAVA_LINE_COMMENT
 | JAVA_BLOCK_COMMENT
 | JAVA_OPEN_BRACE
 | JAVA_CLOSE_BRACE
 | JAVA_OTHER
 ;

Test it with the following code:

String source = &quot;foo \n&quot; +
        &quot;\n&quot; +
        &quot;java { \n&quot; +
        &quot;  char foo() { \n&quot; +
        &quot;    /* a quote in a comment \\\&quot; */ \n&quot; +
        &quot;    String s = \&quot;java {...}\&quot;; \n&quot; +
        &quot;    return &#39;}&#39;; \n&quot; +
        &quot;  }\n&quot; +
        &quot;}\n&quot; +
        &quot;\n&quot; +
        &quot;bar&quot;;

IslandLexer lexer = new IslandLexer(CharStreams.fromString(source));
IslandParser parser = new IslandParser(new CommonTokenStream(lexer));
System.out.println(parser.parse().toStringTree(parser));

which is the following parse tree:

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何允许词法分析器从Java中解析特定的代码部分？

问题

答案1

`IslandLexer.g4`

`IslandParser.g4`

`IslandLexer.g4`

`IslandParser.g4`

数字未添加到现有数组的末尾

在SQLite数据库中插入或读取值没有产生任何结果。

Java ArrayList 线程不安全示例解释

Why does Object provide equals and hash code methods?

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论