Antlr3语法在遇到井号字符时生成解析错误。

huangapple go评论75阅读模式
英文:

Antlr3 grammar generates parsering error on encountering the Pound char

问题

Antlr-3在遇到法语中的英镑符号("£")时生成错误,该符号与英语中的井号"#"相当,即使在词法分析器/语法分析器规则中指定了三个特殊字符**@#$**的Unicode值。

FYI: 法语中英镑符号的Unicode值=英语中井号的Unicode值。

词法分析器/语法分析器规则:

grammar SimpleCalc;

options
{
  k        = 8;
  language = Java;
  //filter   = true;
}

tokens {
    PLUS    = '+' ;
    MINUS   = '-' ;
    MULT    = '*' ;
    DIV     = '/' ;
}

/*------------------------------------------------------------------
 * 语法分析器规则
 *------------------------------------------------------------------*/

expr    : n1=NUMBER ( exp = ( PLUS | MINUS )  n2=NUMBER )* 
{
  if ($exp.text.equals("+"))
   System.out.println("加法结果 = " + $n1.text + $n2.text);
  else
   System.out.println("减法结果 = " + $n1.text + $n2.text);
}
;

/*------------------------------------------------------------------
 * 词法分析器规则
 *------------------------------------------------------------------*/

NUMBER  : (DIGIT)+ ;

WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+    { $channel = HIDDEN; } ;

fragment DIGIT  : '0'..'9' | '\u163' | ('\u0040' | '\u0023' | '\u0024');

该文本文件也以UTF-8格式读取:

    public static void main(String[] args) throws Exception
    {
        try
        {
            args = new String[1];
            args[0] = new String("antlr_test.txt");
            SimpleCalcLexer lex = new SimpleCalcLexer(new ANTLRFileStream(args[0], "UTF-8"));
            CommonTokenStream tokens = new CommonTokenStream(lex);
            
            SimpleCalcParser parser = new SimpleCalcParser(tokens);
            
            parser.expr();
            //System.out.println(tokens);
        }
        catch (Exception e)
        {
            e.printStackTrace();
        }
    }

输入文件只有1行:

\u1633 + 4\u1633;

错误为:

antlr_test.txt第1行第1列:字符'\u1633'没有可行的替代项
antlr_test.txt第1行第7列:字符'\u1633'没有可行的替代项

我的方法有什么问题吗?
还是我漏掉了什么?

英文:

Antlr-3 generating an error on encountering the Pound char ("£") of the French language, which is equivalent char of Hash "#" char of English, even the Unicode value for three special characters @, #, and $ are specified in lexer/parser rule.

FYI: The Unicode value of Pound char (of the French language) = The Unicode value of Hash char (of ENGLISH language).

The lexer/parser rules:

grammar SimpleCalc;

options
{
  k        = 8;
  language = Java;
  //filter   = true;
}
 
tokens {
    PLUS    = '+' ;
    MINUS   = '-' ;
    MULT    = '*' ;
    DIV = '/' ;
}
 
/*------------------------------------------------------------------
 * PARSER RULES
 *------------------------------------------------------------------*/
 
expr    : n1=NUMBER ( exp = ( PLUS | MINUS )  n2=NUMBER )* 
{
  if ($exp.text.equals("+"))
   System.out.println("Plus Result = " + $n1.text + $n2.text);
  else
   System.out.println("Minus Result = " + $n1.text + $n2.text);
}
;
 
/*------------------------------------------------------------------
 * LEXER RULES
 *------------------------------------------------------------------*/
 
NUMBER  : (DIGIT)+ ;
 
WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+    { $channel = HIDDEN; } ;
 
fragment DIGIT  : '0'..'9' | '£' | ('\u0040' | '\u0023' | '\u0024');

The text file also reading in UTF-8 as:

    public static void main(String[] args) throws Exception
    {
        try
        {
            args = new String[1];
            args[0] = new String("antlr_test.txt");
            SimpleCalcLexer lex = new SimpleCalcLexer(new ANTLRFileStream(args[0], "UTF-8"));
            CommonTokenStream tokens = new CommonTokenStream(lex);
            
            SimpleCalcParser parser = new SimpleCalcParser(tokens);
            
            parser.expr();
            //System.out.println(tokens);
        }
        catch (Exception e)
        {
            e.printStackTrace();
        }
    }

The input file is having only 1 line:

 £3 + 4£
 

the error is:

antlr_test.txt line 1:1 no viable alternative at character '£'
antlr_test.txt line 1:7 no viable alternative at character '£'

What is wrong with my approach?
or did I miss something?

答案1

得分: 1

无法复现您所描述的情况。当我测试您的语法而没有进行修改时,我得到一个“NumberFormatException”错误,这是预期的,因为Integer.parseInt("£3")无法成功。

当我将您的嵌入式代码更改为以下内容时:

{
  if ($exp.text.equals("+"))
   System.out.println("Result = " + (Integer.parseInt($n1.text.replaceAll("\\D", "")) + Integer.parseInt($n2.text.replaceAll("\\D", ""))));
  else
   System.out.println("Result = " + (Integer.parseInt($n1.text.replaceAll("\\D", "")) - Integer.parseInt($n2.text.replaceAll("\\D", ""))));
}

并重新生成词法分析器和语法分析器类(这可能是您没有做的),然后重新运行驱动代码,我得到以下输出:

Result = 7

编辑

也许语法中的英镑符号是问题所在?如果您尝试以下内容:

fragment DIGIT  : '0'..'9' | '\u00A3' | ('\u0040' | '\u0023' | '\u0024');

而不是:

fragment DIGIT  : '0'..'9' | '\u00A3' | ('\u0023' | '\u00A3' | '\u0024');

英文:

I cannot reproduce what you describe. When I test your grammar without modifications, I get a NumberFormatException, which is expected, because Integer.parseInt("£3") cannot succeed.

When I change your embedded code into this:

{
  if ($exp.text.equals("+"))
   System.out.println("Result = " + (Integer.parseInt($n1.text.replaceAll("\\D", "")) + Integer.parseInt($n2.text.replaceAll("\\D", ""))));
  else
   System.out.println("Result = " + (Integer.parseInt($n1.text.replaceAll("\\D", "")) - Integer.parseInt($n2.text.replaceAll("\\D", ""))));
}

and regenerate lexer and parser classes (something you might not have done) and rerun the driver code, I get the following output:

Result = 7

EDIT

Perhaps the pound sign in the grammar is the issue? What if you try:

fragment DIGIT  : '0'..'9' | '\u00A3' | ('\u0040' | '\u0023' | '\u0024');

instead of:

fragment DIGIT  : '0'..'9' | '£' | ('\u0040' | '\u0023' | '\u0024');

?

huangapple
  • 本文由 发表于 2020年9月27日 02:00:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/64080945.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定