英文:
Antlr4 - no viable alternative at input
问题
我正在尝试创建一个简单的HOCON解析器(从现有的JSON解析器开始)。
语法定义如下:
/** 参考自Terence Parr的《The Definitive ANTLR 4 Reference》 */
// 派生自http://json.org
grammar HOCON;
hocon
: value
| pair
;
obj
: object_begin pair (','? pair)* object_end
| object_begin object_end
;
pair
: STRING KV? value {fmt.Println("pairstr",$STRING.GetText())}
| KEY KV? value {fmt.Println("pairkey",$KEY.GetText())}
;
array
: array_begin value (',' value)* array_end
| array_begin array_end
;
value
: STRING {fmt.Println($STRING.GetText())}
| REFERENCE {fmt.Println($REFERENCE.GetText())}
| RAWSTRING {fmt.Println($RAWSTRING.GetText())}
| NUMBER {fmt.Println($NUMBER.GetText())}
| obj
| array
| 'true'
| 'false'
| 'null'
;
COMMENT
: '#' ~( '\r' | '\n' )* -> skip
;
STRING
: '"' (ESC | ~ ["\\])* '"'
| '\'' (ESC | ~ ['\\])* '\''
;
RAWSTRING
: (ESC | ALPHANUM)+
;
KEY
: ( '.' | ALPHANUM | '-')+
;
REFERENCE
: '${' (ALPHANUM|'.')+ '}'
;
fragment ESC
: '\\' (["\\/bfnrt] | UNICODE)
;
fragment UNICODE
: 'u' HEX HEX HEX HEX
;
fragment ALPHANUM
: [0-9a-zA-Z]
;
fragment HEX
: [0-9a-fA-F]
;
KV
: [=:]
;
array_begin
: '[' { fmt.Println("BEGIN [") }
;
array_end
: ']' { fmt.Println("] END") }
;
object_begin
: '{' { fmt.Println("OBJ {") }
;
object_end
: '}' { fmt.Println("} OBJ") }
;
NUMBER
: '-'? INT '.' [0-9]+ EXP? | '-'? INT EXP | '-'? INT
;
fragment INT
: '0' | [1-9] [0-9]*
;
// no leading zeros
fragment EXP
: [Ee] [+\-]? INT
;
// \- since - means "range" inside [...]
WS
: [ \t\n\r]+ -> skip
;
错误是:
第2行第2列,输入“{journal”没有可行的替代项
pairkey akka.persistence
导致错误的示例输入是:
akka.persistence {
journal {
# Absolute path to the journal plugin configuration entry used by
# persistent actor or view by default.
# Persistent actor or view can override `journalPluginId` method
# in order to rely on a different journal plugin.
plugin = ""
}
}
然而,如果我将其更新为使用带引号的字符串:
akka.persistence {
'journal' {
# Absolute path to the journal plugin configuration entry used by
# persistent actor or view by default.
# Persistent actor or view can override `journalPluginId` method
# in order to rely on a different journal plugin.
'plugin' = ""
}
}
一切都按预期工作。
看起来我在KEY
定义中漏掉了一些东西,但我无法确定具体是什么。
用于测试的Go代码如下:
package main
import (
"github.com/antlr/antlr4/runtime/Go/antlr"
"go-hocon/parser"
)
func main() {
is, _ := antlr.NewFileStream("test/simple1.conf")
lex := parser.NewHOCONLexer(is)
p := parser.NewHOCONParser(antlr.NewCommonTokenStream(lex, 0))
p.BuildParseTrees = true
p.Hocon()
}
英文:
I am trying to create a simple HOCON parser (started from the existing JSON one).
The grammar is defined as:
/** Taken from "The Definitive ANTLR 4 Reference" by Terence Parr */
// Derived from http://json.org
grammar HOCON;
hocon
: value
| pair
;
obj
: object_begin pair (','? pair)* object_end
| object_begin object_end
;
pair
: STRING KV? value {fmt.Println("pairstr",$STRING.GetText())}
| KEY KV? value {fmt.Println("pairkey",$KEY.GetText())}
;
array
: array_begin value (',' value)* array_end
| array_begin array_end
;
value
: STRING {fmt.Println($STRING.GetText())}
| REFERENCE {fmt.Println($REFERENCE.GetText())}
| RAWSTRING {fmt.Println($RAWSTRING.GetText())}
| NUMBER {fmt.Println($NUMBER.GetText())}
| obj
| array
| 'true'
| 'false'
| 'null'
;
COMMENT
: '#' ~( '\r' | '\n' )* -> skip
;
STRING
: '"' (ESC | ~ ["\\])* '"'
| '\'' (ESC | ~ ['\\])* '\''
;
RAWSTRING
: (ESC | ALPHANUM)+
;
KEY
: ( '.' | ALPHANUM | '-')+
;
REFERENCE
: '${' (ALPHANUM|'.')+ '}'
;
fragment ESC
: '\\' (["\\/bfnrt] | UNICODE)
;
fragment UNICODE
: 'u' HEX HEX HEX HEX
;
fragment ALPHANUM
: [0-9a-zA-Z]
;
fragment HEX
: [0-9a-fA-F]
;
KV
: [=:]
;
array_begin
: '[' { fmt.Println("BEGIN [") }
;
array_end
: ']' { fmt.Println("] END") }
;
object_begin
: '{' { fmt.Println("OBJ {") }
;
object_end
: '}' { fmt.Println("} OBJ") }
;
NUMBER
: '-'? INT '.' [0-9] + EXP? | '-'? INT EXP | '-'? INT
;
fragment INT
: '0' | [1-9] [0-9]*
;
// no leading zeros
fragment EXP
: [Ee] [+\-]? INT
;
// \- since - means "range" inside [...]
WS
: [ \t\n\r] + -> skip
;
the error is:
line 2:2 no viable alternative at input '{journal'
pairkey akka.persistence
the sample input that gives the error is:
akka.persistence {
journal {
# Absolute path to the journal plugin configuration entry used by
# persistent actor or view by default.
# Persistent actor or view can override `journalPluginId` method
# in order to rely on a different journal plugin.
plugin = ""
}
}
however if I will update it to use quoted strings:
akka.persistence {
'journal' {
# Absolute path to the journal plugin configuration entry used by
# persistent actor or view by default.
# Persistent actor or view can override `journalPluginId` method
# in order to rely on a different journal plugin.
'plugin' = ""
}
}
everything works as expected.
Looks like I miss something in the KEY
definition, but I can't really find out what exactly.
The Go code to test it out is:
package main
import (
"github.com/antlr/antlr4/runtime/Go/antlr"
"go-hocon/parser"
)
func main() {
is, _ := antlr.NewFileStream("test/simple1.conf")
lex := parser.NewHOCONLexer(is)
p := parser.NewHOCONParser(antlr.NewCommonTokenStream(lex, 0))
p.BuildParseTrees = true
p.Hocon()
}
答案1
得分: 1
你的第一个输入将_journal_识别为RAWSTRING
。
[@0,0:15='akka.persistence',<KEY>,1:0]
[@1,17:17='{',<'{'>,1:17]
[@2,22:28='journal',<RAWSTRING>,2:2]
[@3,30:30='{',<'{'>,2:10]
[@4,277:282='plugin',<RAWSTRING>,7:4]
[@5,284:284='=',<KV>,7:11]
[@6,286:287='""',<STRING>,7:13]
[@7,292:292='}',<'}'>,8:2]
[@8,295:295='}',<'}'>,9:0]
[@9,298:297='<EOF>',<EOF>,10:0]
第2行第2列的输入“{journal”没有可行的替代项。
另一方面,“'journal'”被识别为字符串,但是它有单引号,显然你不想要这些单引号:
[@0,0:15='akka.persistence',<KEY>,1:0]
[@1,17:17='{',<'{'>,1:17]
[@2,22:30=''journal'',<STRING>,2:2] <-- 现在它是一个隐式字符串标记
[@3,32:32='{',<'{'>,2:12]
[@4,279:284='plugin',<RAWSTRING>,7:4]
[@5,286:286='=',<KV>,7:11]
[@6,288:289='""',<STRING>,7:13]
[@7,294:294='}',<'}'>,8:2]
[@8,297:297='}',<'}'>,9:0]
[@9,300:299='<EOF>',<EOF>,10:0]
第7行第4列的输入“{plugin”没有可行的替代项。
第8行第2列的输入“}”与期望的{'true','false','null','[','{',STRING,RAWSTRING,REFERENCE,KV,NUMBER}不匹配。
为什么会这样呢?因为词法分析器规则按照以下方式绑定:
- 首先匹配最长的输入。
- 匹配隐式标记(如“journal”)。
- 如果输入长度相等,则根据词法分析器规则的顺序进行匹配。
在你的情况下,将'journal'
放在那里_使其匹配为隐式标记_,所以它似乎工作正常。但是只有因为有了那些单引号,这使得它根据上述规则2进行匹配。如果没有引号,这两个标记将被匹配为RAWSTRING,这不符合规则
pair
: STRING KV? value //{fmt.Println("pairstr",$STRING.GetText())}
因此会出现错误。
如何修复?我反转了词法分析器规则:
RAWSTRING
: (ESC | ALPHANUM)+
;
STRING
: ''' (ESC | ~ ['\\])* '''
| '\'' (ESC | ~ ['\\])* '\''
;
并修改了pair
:
pair
: RAWSTRING KV? value //{fmt.Println("pairstr",$STRING.GetText())}
现在它可以正确解析:
[@0,0:15='akka.persistence',<KEY>,1:0]
[@1,17:17='{',<'{'>,1:17]
[@2,22:28='journal',<RAWSTRING>,2:2]
[@3,30:30='{',<'{'>,2:10]
[@4,277:282='plugin',<RAWSTRING>,7:4]
[@5,284:284='=',<KV>,7:11]
[@6,286:287='""',<STRING>,7:13]
[@7,292:292='}',<'}'>,8:2]
[@8,295:295='}',<'}'>,9:0]
[@9,298:297='<EOF>',<EOF>,10:0]
英文:
Your first input makes journal lex as a RAWSTRING
.
[@0,0:15='akka.persistence',<KEY>,1:0]
[@1,17:17='{',<'{'>,1:17]
[@2,22:28='journal',<RAWSTRING>,2:2]
[@3,30:30='{',<'{'>,2:10]
[@4,277:282='plugin',<RAWSTRING>,7:4]
[@5,284:284='=',<KV>,7:11]
[@6,286:287='""',<STRING>,7:13]
[@7,292:292='}',<'}'>,8:2]
[@8,295:295='}',<'}'>,9:0]
[@9,298:297='<EOF>',<EOF>,10:0]
line 2:2 no viable alternative at input '{journal'
On the other hand, 'journal' lexes as a string, but has those single quotes which you clearly don't want:
[@0,0:15='akka.persistence',<KEY>,1:0]
[@1,17:17='{',<'{'>,1:17]
[@2,22:30=''journal'',<STRING>,2:2] <-- now it's a string implicit token
[@3,32:32='{',<'{'>,2:12]
[@4,279:284='plugin',<RAWSTRING>,7:4]
[@5,286:286='=',<KV>,7:11]
[@6,288:289='""',<STRING>,7:13]
[@7,294:294='}',<'}'>,8:2]
[@8,297:297='}',<'}'>,9:0]
[@9,300:299='<EOF>',<EOF>,10:0]
line 7:4 no viable alternative at input '{plugin'
line 8:2 mismatched input '}' expecting {'true', 'false', 'null', '[', '{', STRING, RAWSTRING, REFERENCE, KV, NUMBER}
Why? Because lexer rules bind in the following way:
- Match longest input first.
- Match implicit tokens (like 'journal')
- If length of input match is equal, match based on the order of the lexer rules.
In your case, putting 'journal'
makes it match as an implicit token, so it seems to work okay. But only because of those single quotes, which makes it match per rule 2 above Without the quotes, these two tokens are being matched as RAWSTRING, which doesn't fit the rule
pair
: STRING KV? value //{fmt.Println("pairstr",$STRING.GetText())}
Hence the error.
How to fix? Well, I reversed the lexer rules:
RAWSTRING
: (ESC | ALPHANUM)+
;
STRING
: '"' (ESC | ~ ["\\])* '"'
| '\'' (ESC | ~ ['\\])* '\''
;
And changed pair
:
pair
: RAWSTRING KV? value //{fmt.Println("pairstr",$STRING.GetText())}
Now it parses fine:
[@0,0:15='akka.persistence',<KEY>,1:0]
[@1,17:17='{',<'{'>,1:17]
[@2,22:28='journal',<RAWSTRING>,2:2]
[@3,30:30='{',<'{'>,2:10]
[@4,277:282='plugin',<RAWSTRING>,7:4]
[@5,284:284='=',<KV>,7:11]
[@6,286:287='""',<STRING>,7:13]
[@7,292:292='}',<'}'>,8:2]
[@8,295:295='}',<'}'>,9:0]
[@9,298:297='<EOF>',<EOF>,10:0]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论