2020年9月27日 02:00:48go评论75阅读模式

英文:

Antlr3 grammar generates parsering error on encountering the Pound char

问题

Antlr-3在遇到法语中的英镑符号（"£"）时生成错误，该符号与英语中的井号"#"相当，即使在词法分析器/语法分析器规则中指定了三个特殊字符**@，#和$**的Unicode值。

FYI: 法语中英镑符号的Unicode值=英语中井号的Unicode值。

词法分析器/语法分析器规则：

grammar SimpleCalc;

options
{
  k        = 8;
  language = Java;
  //filter   = true;
}

tokens {
    PLUS    = '+' ;
    MINUS   = '-' ;
    MULT    = '*' ;
    DIV     = '/' ;
}

/*------------------------------------------------------------------
 * 语法分析器规则
 *------------------------------------------------------------------*/

expr    : n1=NUMBER ( exp = ( PLUS | MINUS )  n2=NUMBER )* 
{
  if ($exp.text.equals("+"))
   System.out.println("加法结果 = " + $n1.text + $n2.text);
  else
   System.out.println("减法结果 = " + $n1.text + $n2.text);
}
;

/*------------------------------------------------------------------
 * 词法分析器规则
 *------------------------------------------------------------------*/

NUMBER  : (DIGIT)+ ;

WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+    { $channel = HIDDEN; } ;

fragment DIGIT  : '0'..'9' | '\u163' | ('\u0040' | '\u0023' | '\u0024');

该文本文件也以UTF-8格式读取：

    public static void main(String[] args) throws Exception
    {
        try
        {
            args = new String[1];
            args[0] = new String("antlr_test.txt");
            SimpleCalcLexer lex = new SimpleCalcLexer(new ANTLRFileStream(args[0], "UTF-8"));
            CommonTokenStream tokens = new CommonTokenStream(lex);
            
            SimpleCalcParser parser = new SimpleCalcParser(tokens);
            
            parser.expr();
            //System.out.println(tokens);
        }
        catch (Exception e)
        {
            e.printStackTrace();
        }
    }

输入文件只有1行：

\u1633 + 4\u1633;

错误为：

antlr_test.txt第1行第1列：字符'\u1633'没有可行的替代项
antlr_test.txt第1行第7列：字符'\u1633'没有可行的替代项

我的方法有什么问题吗？
还是我漏掉了什么？

英文:

Antlr-3 generating an error on encountering the Pound char ("£") of the French language, which is equivalent char of Hash "#" char of English, even the Unicode value for three special characters @, #, and $ are specified in lexer/parser rule.

FYI: The Unicode value of Pound char (of the French language) = The Unicode value of Hash char (of ENGLISH language).

The lexer/parser rules:

grammar SimpleCalc;

options
{
  k        = 8;
  language = Java;
  //filter   = true;
}
 
tokens {
    PLUS    = &#39;+&#39; ;
    MINUS   = &#39;-&#39; ;
    MULT    = &#39;*&#39; ;
    DIV = &#39;/&#39; ;
}
 
/*------------------------------------------------------------------
 * PARSER RULES
 *------------------------------------------------------------------*/
 
expr    : n1=NUMBER ( exp = ( PLUS | MINUS )  n2=NUMBER )* 
{
  if ($exp.text.equals(&quot;+&quot;))
   System.out.println(&quot;Plus Result = &quot; + $n1.text + $n2.text);
  else
   System.out.println(&quot;Minus Result = &quot; + $n1.text + $n2.text);
}
;
 
/*------------------------------------------------------------------
 * LEXER RULES
 *------------------------------------------------------------------*/
 
NUMBER  : (DIGIT)+ ;
 
WHITESPACE : ( &#39;\t&#39; | &#39; &#39; | &#39;\r&#39; | &#39;\n&#39;| &#39;\u000C&#39; )+    { $channel = HIDDEN; } ;
 
fragment DIGIT  : &#39;0&#39;..&#39;9&#39; | &#39;&#163;&#39; | (&#39;\u0040&#39; | &#39;\u0023&#39; | &#39;\u0024&#39;);

The text file also reading in UTF-8 as:

    public static void main(String[] args) throws Exception
    {
        try
        {
            args = new String[1];
            args[0] = new String(&quot;antlr_test.txt&quot;);
            SimpleCalcLexer lex = new SimpleCalcLexer(new ANTLRFileStream(args[0], &quot;UTF-8&quot;));
            CommonTokenStream tokens = new CommonTokenStream(lex);
            
            SimpleCalcParser parser = new SimpleCalcParser(tokens);
            
            parser.expr();
            //System.out.println(tokens);
        }
        catch (Exception e)
        {
            e.printStackTrace();
        }
    }

The input file is having only 1 line:

 &#163;3 + 4&#163;

the error is:

antlr_test.txt line 1:1 no viable alternative at character &#39;&#163;&#39;
antlr_test.txt line 1:7 no viable alternative at character &#39;&#163;&#39;

What is wrong with my approach?
or did I miss something?

答案1

得分: 1

无法复现您所描述的情况。当我测试您的语法而没有进行修改时，我得到一个“NumberFormatException”错误，这是预期的，因为Integer.parseInt("&pound;3")无法成功。

当我将您的嵌入式代码更改为以下内容时：

{
  if ($exp.text.equals("+"))
   System.out.println("Result = " + (Integer.parseInt($n1.text.replaceAll("\\D", "")) + Integer.parseInt($n2.text.replaceAll("\\D", ""))));
  else
   System.out.println("Result = " + (Integer.parseInt($n1.text.replaceAll("\\D", "")) - Integer.parseInt($n2.text.replaceAll("\\D", ""))));
}

并重新生成词法分析器和语法分析器类（这可能是您没有做的），然后重新运行驱动代码，我得到以下输出：

Result = 7

编辑

也许语法中的英镑符号是问题所在？如果您尝试以下内容：

fragment DIGIT  : '0'..'9' | '\u00A3' | ('\u0040' | '\u0023' | '\u0024');

而不是：

fragment DIGIT  : '0'..'9' | '\u00A3' | ('\u0023' | '\u00A3' | '\u0024');

？

英文:

I cannot reproduce what you describe. When I test your grammar without modifications, I get a NumberFormatException, which is expected, because Integer.parseInt("£3") cannot succeed.

When I change your embedded code into this:

{
  if ($exp.text.equals(&quot;+&quot;))
   System.out.println(&quot;Result = &quot; + (Integer.parseInt($n1.text.replaceAll(&quot;\\D&quot;, &quot;&quot;)) + Integer.parseInt($n2.text.replaceAll(&quot;\\D&quot;, &quot;&quot;))));
  else
   System.out.println(&quot;Result = &quot; + (Integer.parseInt($n1.text.replaceAll(&quot;\\D&quot;, &quot;&quot;)) - Integer.parseInt($n2.text.replaceAll(&quot;\\D&quot;, &quot;&quot;))));
}

and regenerate lexer and parser classes (something you might not have done) and rerun the driver code, I get the following output:

Result = 7

EDIT

Perhaps the pound sign in the grammar is the issue? What if you try:

fragment DIGIT  : &#39;0&#39;..&#39;9&#39; | &#39;\u00A3&#39; | (&#39;\u0040&#39; | &#39;\u0023&#39; | &#39;\u0024&#39;);

instead of:

fragment DIGIT  : &#39;0&#39;..&#39;9&#39; | &#39;&#163;&#39; | (&#39;\u0040&#39; | &#39;\u0023&#39; | &#39;\u0024&#39;);

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Antlr3语法在遇到井号字符时生成解析错误。

问题

答案1

编辑

EDIT

如何在每次调用组件方法时增加字符串输入

(Java) 返回类型是int，但我可以返回一个char。为什么？

在Spring Boot中，参数之间有什么区别？

正则表达式程序在搜索带有空格和反斜杠的字符串时性能问题。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论