2023年6月1日 05:42:48go评论97阅读模式

英文:

Why does ANTLR parser not throw an error for invalid numerical input in Java?

问题

我建了以下ANTLR语法（antlr4-runtime-4.13.0）用于简单条件：

语法 Condition;
@header {
package expression;
}
条件
    :(表达式)('OR' 表达式)*
    ;
	
表达式
    : 标识 '=' 数字
    ;
	
标识 : ('a'..'z' | 'A'..'Z')+;
数字   : [0-9]+;
WS    : [ \t\r\n]+ -> 跳过;

我用这个Java主要测试它：

public class TestANTLRGrammar extends ConditionBaseListener   {
	
	public static void main(String[] args) {
		String entry = "id = 889xx88 OR y = 7";
		ConditionLexer lexer = new ConditionLexer(CharStreams.fromString(entry));
		TokenStream tokens = new CommonTokenStream(lexer);
		ConditionParser parser = new ConditionParser(tokens);
		parser.condition();
		System.out.println(parser.getNumberOfSyntaxErrors());
	}
}

我期望解析器会抛出错误，因为"889xx88"不应该被视为数字，但解析器识别为"id = 889"并停止，而不继续处理条件的其余部分（即"OR y = 7"）。

getNumberOfSyntaxErrors()函数显示"0"。有人可以帮我解决这个问题吗？

英文:

I built following ANTLR grammar (antlr4-runtime-4.13.0) for a simple condition:

grammar Condition;
@header {
package expression;
}
condition
    :(expression)(&#39;OR&#39; expression)*
    ;
	
expression
    : IDENT &#39;=&#39; NUM
    ;
	
IDENT : (&#39;a&#39;..&#39;z&#39; | &#39;A&#39;..&#39;Z&#39;)+;
NUM   : [0-9]+;
WS    : [ \t\r\n]+ -&gt; skip;

I used this Java main test it:

public class TestANTLRGrammar extends ConditionBaseListener   {
	
	public static void main(String[] args) {
		String entry = &quot;id = 889xx88 OR y = 7&quot;;
		ConditionLexer lexer = new ConditionLexer(CharStreams.fromString(entry));
		TokenStream tokens = new CommonTokenStream(lexer);
		ConditionParser parser = new ConditionParser(tokens);
		parser.condition();
		System.out.println(parser.getNumberOfSyntaxErrors());
	}
}

I expected the parser to throw an error because "889xx88" shouldn't be considered as number but the parser identified "id = 889" and stops without continuing to the rest of the condition (i.e. "OR y = 7").
The function getNumberOfSyntaxErrors() displayed "0".
Can anyone help me to fix this problem, please ?

I expected the parser to throw an error as explained above.

答案1

得分: 0

对于输入 id = 889xx88 OR y = 7，词法分析器将生成以下 9 个标记：

IDENT: id
'=': =
NUM: 889
IDENT: xx
NUM: 88
'OR': OR
IDENT: y
'=': =
NUM: 7

如果现在让解析器规则 condition 消耗这些标记，它会愉快地从这些标记中创建 IDENT = NUM (id = 889)，然后停止解析。

正如评论中的 kaby76 提到的：创建一个包含内置的 EOF（文件结束）标记的起始规则，以确保所有标记都被消耗（否则将报告错误，如果无法这样做）：

start
 : condition EOF
 ;

请注意，解析器很可能只会将错误打印到 STDERR，并且（尝试）在错误后继续解析。这是ANTLR的默认错误恢复模式。如果想要更改这一点，请尝试搜索“ANTLR 自定义错误恢复”或“ANTLR 自定义错误处理程序”（或类似的内容）。

英文:

For the input id = 889xx88 OR y = 7, the lexer will produce the following 9 tokens:

IDENT: id
'=': =
NUM: 889
IDENT: xx
NUM: 88
'OR': OR
IDENT: y
'=': =
NUM: 7

If you now let the parser rule condition consume these tokens, it happily creates IDENT = NUM (id = 889) from these tokens and will then stop parsing.

As mentioned by kaby76 in the comments: create a start rule that contains the built-in EOF (end-of-file) token to make sure all tokens are consumed (or an error will be reported, if it cannot do so):

start
 : condition EOF
 ;

Note that chances are that the parser will only print an error to your STDERR and will (try to) continue parsing after the error. This is the default error recovery mode of ANTLR. If you want to change that, try searching for "ANTLR custom error recovery" or "ANTLR custom error handler" (or similar).

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

ANTLR解析器在Java中为什么不对无效的数字输入抛出错误？

问题

答案1

如何强制Mockito调用带有参数的底层函数？

如何在客户端安全地保护服务器认证密码？

NoClassDefFoundError DecimalFormat

Hibernate/JPA – 仅在表存在时执行操作

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。