Antlr4 – 输入中没有可行的替代方案

huangapple go评论103阅读模式
英文:

Antlr4 - no viable alternative at input

问题

我正在尝试创建一个简单的HOCON解析器(从现有的JSON解析器开始)。

语法定义如下:

/** 参考自Terence Parr的《The Definitive ANTLR 4 Reference》 */

// 派生自http://json.org
grammar HOCON;

hocon
   : value
   | pair
   ;

obj
   : object_begin pair (','? pair)* object_end
   | object_begin object_end
   ;

pair
   : STRING KV? value {fmt.Println("pairstr",$STRING.GetText())}
   | KEY KV? value {fmt.Println("pairkey",$KEY.GetText())}
   ;

array
   : array_begin value (',' value)* array_end
   | array_begin array_end
   ;

value
   : STRING {fmt.Println($STRING.GetText())}
   | REFERENCE {fmt.Println($REFERENCE.GetText())}
   | RAWSTRING {fmt.Println($RAWSTRING.GetText())}
   | NUMBER {fmt.Println($NUMBER.GetText())}
   | obj
   | array
   | 'true'
   | 'false'
   | 'null'
   ;

COMMENT
   : '#' ~( '\r' | '\n' )* -> skip
   ;

STRING
   : '"' (ESC | ~ ["\\])* '"'
   | '\'' (ESC | ~ ['\\])* '\''
   ;

RAWSTRING
   : (ESC | ALPHANUM)+
   ;

KEY
   : ( '.' | ALPHANUM | '-')+
   ;

REFERENCE
   : '${' (ALPHANUM|'.')+ '}'
   ;

fragment ESC
   : '\\' (["\\/bfnrt] | UNICODE)
   ;


fragment UNICODE
   : 'u' HEX HEX HEX HEX
   ;

fragment ALPHANUM
   : [0-9a-zA-Z]
   ;

fragment HEX
   : [0-9a-fA-F]
   ;

KV
   : [=:]
   ;

array_begin
   : '[' { fmt.Println("BEGIN [") }
   ;

array_end
   : ']' { fmt.Println("] END") }
   ;

object_begin
   : '{' { fmt.Println("OBJ {") }
   ;

object_end
   : '}' { fmt.Println("} OBJ") }
   ;

NUMBER
   : '-'? INT '.' [0-9]+ EXP? | '-'? INT EXP | '-'? INT
   ;

fragment INT
   : '0' | [1-9] [0-9]*
   ;

// no leading zeros

fragment EXP
   : [Ee] [+\-]? INT
   ;

// \- since - means "range" inside [...]

WS
   : [ \t\n\r]+ -> skip
   ;

错误是:

第2行第2列,输入“{journal”没有可行的替代项
pairkey akka.persistence

导致错误的示例输入是:

akka.persistence {
  journal {
    # Absolute path to the journal plugin configuration entry used by
    # persistent actor or view by default.
    # Persistent actor or view can override `journalPluginId` method
    # in order to rely on a different journal plugin.
    plugin = ""
  }
}

然而,如果我将其更新为使用带引号的字符串:

akka.persistence {
  'journal' {
    # Absolute path to the journal plugin configuration entry used by
    # persistent actor or view by default.
    # Persistent actor or view can override `journalPluginId` method
    # in order to rely on a different journal plugin.
    'plugin' = ""
  }
}

一切都按预期工作。

看起来我在KEY定义中漏掉了一些东西,但我无法确定具体是什么。

用于测试的Go代码如下:

package main

import (
	"github.com/antlr/antlr4/runtime/Go/antlr"
	"go-hocon/parser"
)

func main() {
	is, _ := antlr.NewFileStream("test/simple1.conf")

	lex := parser.NewHOCONLexer(is)
	p := parser.NewHOCONParser(antlr.NewCommonTokenStream(lex, 0))
	p.BuildParseTrees = true
	p.Hocon()
}
英文:

I am trying to create a simple HOCON parser (started from the existing JSON one).

The grammar is defined as:

/** Taken from "The Definitive ANTLR 4 Reference" by Terence Parr */

// Derived from http://json.org
grammar HOCON;

hocon
   : value
   | pair
   ;

obj
   : object_begin pair (','? pair)* object_end
   | object_begin object_end
   ;

pair
   : STRING KV? value {fmt.Println("pairstr",$STRING.GetText())}
   | KEY KV? value {fmt.Println("pairkey",$KEY.GetText())}
   ;

array
   : array_begin value (',' value)* array_end
   | array_begin array_end
   ;

value
   : STRING {fmt.Println($STRING.GetText())}
   | REFERENCE {fmt.Println($REFERENCE.GetText())}
   | RAWSTRING {fmt.Println($RAWSTRING.GetText())}
   | NUMBER {fmt.Println($NUMBER.GetText())}
   | obj
   | array
   | 'true'
   | 'false'
   | 'null'
   ;

COMMENT
   : '#' ~( '\r' | '\n' )* -> skip
   ;

STRING
   : '"' (ESC | ~ ["\\])* '"'
   | '\'' (ESC | ~ ['\\])* '\''
   ;

RAWSTRING
   : (ESC | ALPHANUM)+
   ;

KEY
   : ( '.' | ALPHANUM | '-')+
   ;

REFERENCE
   : '${' (ALPHANUM|'.')+ '}'
   ;

fragment ESC
   : '\\' (["\\/bfnrt] | UNICODE)
   ;


fragment UNICODE
   : 'u' HEX HEX HEX HEX
   ;

fragment ALPHANUM
   : [0-9a-zA-Z]
   ;

fragment HEX
   : [0-9a-fA-F]
   ;

KV
   : [=:]
   ;

array_begin
   : '[' { fmt.Println("BEGIN [") }
   ;

array_end
   : ']' { fmt.Println("] END") }
   ;

object_begin
   : '{' { fmt.Println("OBJ {") }
   ;

object_end
   : '}' { fmt.Println("} OBJ") }
   ;

NUMBER
   : '-'? INT '.' [0-9] + EXP? | '-'? INT EXP | '-'? INT
   ;

fragment INT
   : '0' | [1-9] [0-9]*
   ;

// no leading zeros

fragment EXP
   : [Ee] [+\-]? INT
   ;

// \- since - means "range" inside [...]

WS
   : [ \t\n\r] + -> skip
   ;

the error is:

line 2:2 no viable alternative at input '{journal'
pairkey akka.persistence

the sample input that gives the error is:

akka.persistence {
  journal {
    # Absolute path to the journal plugin configuration entry used by
    # persistent actor or view by default.
    # Persistent actor or view can override `journalPluginId` method
    # in order to rely on a different journal plugin.
    plugin = ""
  }
}

however if I will update it to use quoted strings:

akka.persistence {
  'journal' {
    # Absolute path to the journal plugin configuration entry used by
    # persistent actor or view by default.
    # Persistent actor or view can override `journalPluginId` method
    # in order to rely on a different journal plugin.
    'plugin' = ""
  }
}

everything works as expected.

Looks like I miss something in the KEY definition, but I can't really find out what exactly.

The Go code to test it out is:

package main

import (
	"github.com/antlr/antlr4/runtime/Go/antlr"
	"go-hocon/parser"
)

func main() {
	is, _ := antlr.NewFileStream("test/simple1.conf")

	lex := parser.NewHOCONLexer(is)
	p := parser.NewHOCONParser(antlr.NewCommonTokenStream(lex, 0))
	p.BuildParseTrees = true
	p.Hocon()
}

答案1

得分: 1

你的第一个输入将_journal_识别为RAWSTRING

[@0,0:15='akka.persistence',<KEY>,1:0]
[@1,17:17='{',<'{'>,1:17]
[@2,22:28='journal',<RAWSTRING>,2:2]
[@3,30:30='{',<'{'>,2:10]
[@4,277:282='plugin',<RAWSTRING>,7:4]
[@5,284:284='=',<KV>,7:11]
[@6,286:287='""',<STRING>,7:13]
[@7,292:292='}',<'}'>,8:2]
[@8,295:295='}',<'}'>,9:0]
[@9,298:297='<EOF>',<EOF>,10:0]
第2行第2列的输入“{journal”没有可行的替代项。

另一方面,“'journal'”被识别为字符串,但是它有单引号,显然你不想要这些单引号:

[@0,0:15='akka.persistence',<KEY>,1:0]
[@1,17:17='{',<'{'>,1:17]
[@2,22:30=''journal'',<STRING>,2:2]  <-- 现在它是一个隐式字符串标记
[@3,32:32='{',<'{'>,2:12]
[@4,279:284='plugin',<RAWSTRING>,7:4]
[@5,286:286='=',<KV>,7:11]
[@6,288:289='""',<STRING>,7:13]
[@7,294:294='}',<'}'>,8:2]
[@8,297:297='}',<'}'>,9:0]
[@9,300:299='<EOF>',<EOF>,10:0]
第7行第4列的输入“{plugin”没有可行的替代项。
第8行第2列的输入“}”与期望的{'true','false','null','[','{',STRING,RAWSTRING,REFERENCE,KV,NUMBER}不匹配。

为什么会这样呢?因为词法分析器规则按照以下方式绑定:

  1. 首先匹配最长的输入。
  2. 匹配隐式标记(如“journal”)。
  3. 如果输入长度相等,则根据词法分析器规则的顺序进行匹配。

在你的情况下,将'journal'放在那里_使其匹配为隐式标记_,所以它似乎工作正常。但是只有因为有了那些单引号,这使得它根据上述规则2进行匹配。如果没有引号,这两个标记将被匹配为RAWSTRING,这不符合规则

pair
   : STRING KV? value //{fmt.Println("pairstr",$STRING.GetText())}

因此会出现错误。

如何修复?我反转了词法分析器规则:

RAWSTRING
   : (ESC | ALPHANUM)+
   ;
 
STRING
   : ''' (ESC | ~ ['\\])* '''
   | '\'' (ESC | ~ ['\\])* '\''
   ;

并修改了pair

pair
   : RAWSTRING KV? value //{fmt.Println("pairstr",$STRING.GetText())}

现在它可以正确解析:

[@0,0:15='akka.persistence',<KEY>,1:0]
[@1,17:17='{',<'{'>,1:17]
[@2,22:28='journal',<RAWSTRING>,2:2]
[@3,30:30='{',<'{'>,2:10]
[@4,277:282='plugin',<RAWSTRING>,7:4]
[@5,284:284='=',<KV>,7:11]
[@6,286:287='""',<STRING>,7:13]
[@7,292:292='}',<'}'>,8:2]
[@8,295:295='}',<'}'>,9:0]
[@9,298:297='<EOF>',<EOF>,10:0]
英文:

Your first input makes journal lex as a RAWSTRING.

[@0,0:15='akka.persistence',<KEY>,1:0]
[@1,17:17='{',<'{'>,1:17]
[@2,22:28='journal',<RAWSTRING>,2:2]
[@3,30:30='{',<'{'>,2:10]
[@4,277:282='plugin',<RAWSTRING>,7:4]
[@5,284:284='=',<KV>,7:11]
[@6,286:287='""',<STRING>,7:13]
[@7,292:292='}',<'}'>,8:2]
[@8,295:295='}',<'}'>,9:0]
[@9,298:297='<EOF>',<EOF>,10:0]
line 2:2 no viable alternative at input '{journal'

On the other hand, 'journal' lexes as a string, but has those single quotes which you clearly don't want:

[@0,0:15='akka.persistence',<KEY>,1:0]
[@1,17:17='{',<'{'>,1:17]
[@2,22:30=''journal'',<STRING>,2:2]  <-- now it's a string implicit token
[@3,32:32='{',<'{'>,2:12]
[@4,279:284='plugin',<RAWSTRING>,7:4]
[@5,286:286='=',<KV>,7:11]
[@6,288:289='""',<STRING>,7:13]
[@7,294:294='}',<'}'>,8:2]
[@8,297:297='}',<'}'>,9:0]
[@9,300:299='<EOF>',<EOF>,10:0]
line 7:4 no viable alternative at input '{plugin'
line 8:2 mismatched input '}' expecting {'true', 'false', 'null', '[', '{', STRING, RAWSTRING, REFERENCE, KV, NUMBER}

Why? Because lexer rules bind in the following way:

  1. Match longest input first.
  2. Match implicit tokens (like 'journal')
  3. If length of input match is equal, match based on the order of the lexer rules.

In your case, putting 'journal' makes it match as an implicit token, so it seems to work okay. But only because of those single quotes, which makes it match per rule 2 above Without the quotes, these two tokens are being matched as RAWSTRING, which doesn't fit the rule

pair
   : STRING KV? value //{fmt.Println("pairstr",$STRING.GetText())}

Hence the error.

How to fix? Well, I reversed the lexer rules:

RAWSTRING
   : (ESC | ALPHANUM)+
   ;
 
STRING
   : '"' (ESC | ~ ["\\])* '"'
   | '\'' (ESC | ~ ['\\])* '\''
   ;

And changed pair:

pair
   : RAWSTRING KV? value //{fmt.Println("pairstr",$STRING.GetText())}

Now it parses fine:

[@0,0:15='akka.persistence',<KEY>,1:0]
[@1,17:17='{',<'{'>,1:17]
[@2,22:28='journal',<RAWSTRING>,2:2]
[@3,30:30='{',<'{'>,2:10]
[@4,277:282='plugin',<RAWSTRING>,7:4]
[@5,284:284='=',<KV>,7:11]
[@6,286:287='""',<STRING>,7:13]
[@7,292:292='}',<'}'>,8:2]
[@8,295:295='}',<'}'>,9:0]
[@9,298:297='<EOF>',<EOF>,10:0]

huangapple
  • 本文由 发表于 2017年7月7日 10:37:21
  • 转载请务必保留本文链接:https://go.coder-hub.com/44961694.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定