Antlr4 – 输入中没有可行的替代方案

huangapple go评论133阅读模式
英文:

Antlr4 - no viable alternative at input

问题

我正在尝试创建一个简单的HOCON解析器(从现有的JSON解析器开始)。

语法定义如下:

  1. /** 参考自Terence Parr的《The Definitive ANTLR 4 Reference》 */
  2. // 派生自http://json.org
  3. grammar HOCON;
  4. hocon
  5. : value
  6. | pair
  7. ;
  8. obj
  9. : object_begin pair (','? pair)* object_end
  10. | object_begin object_end
  11. ;
  12. pair
  13. : STRING KV? value {fmt.Println("pairstr",$STRING.GetText())}
  14. | KEY KV? value {fmt.Println("pairkey",$KEY.GetText())}
  15. ;
  16. array
  17. : array_begin value (',' value)* array_end
  18. | array_begin array_end
  19. ;
  20. value
  21. : STRING {fmt.Println($STRING.GetText())}
  22. | REFERENCE {fmt.Println($REFERENCE.GetText())}
  23. | RAWSTRING {fmt.Println($RAWSTRING.GetText())}
  24. | NUMBER {fmt.Println($NUMBER.GetText())}
  25. | obj
  26. | array
  27. | 'true'
  28. | 'false'
  29. | 'null'
  30. ;
  31. COMMENT
  32. : '#' ~( '\r' | '\n' )* -> skip
  33. ;
  34. STRING
  35. : '"' (ESC | ~ ["\\])* '"'
  36. | '\'' (ESC | ~ ['\\])* '\''
  37. ;
  38. RAWSTRING
  39. : (ESC | ALPHANUM)+
  40. ;
  41. KEY
  42. : ( '.' | ALPHANUM | '-')+
  43. ;
  44. REFERENCE
  45. : '${' (ALPHANUM|'.')+ '}'
  46. ;
  47. fragment ESC
  48. : '\\' (["\\/bfnrt] | UNICODE)
  49. ;
  50. fragment UNICODE
  51. : 'u' HEX HEX HEX HEX
  52. ;
  53. fragment ALPHANUM
  54. : [0-9a-zA-Z]
  55. ;
  56. fragment HEX
  57. : [0-9a-fA-F]
  58. ;
  59. KV
  60. : [=:]
  61. ;
  62. array_begin
  63. : '[' { fmt.Println("BEGIN [") }
  64. ;
  65. array_end
  66. : ']' { fmt.Println("] END") }
  67. ;
  68. object_begin
  69. : '{' { fmt.Println("OBJ {") }
  70. ;
  71. object_end
  72. : '}' { fmt.Println("} OBJ") }
  73. ;
  74. NUMBER
  75. : '-'? INT '.' [0-9]+ EXP? | '-'? INT EXP | '-'? INT
  76. ;
  77. fragment INT
  78. : '0' | [1-9] [0-9]*
  79. ;
  80. // no leading zeros
  81. fragment EXP
  82. : [Ee] [+\-]? INT
  83. ;
  84. // \- since - means "range" inside [...]
  85. WS
  86. : [ \t\n\r]+ -> skip
  87. ;

错误是:

  1. 2行第2列,输入“{journal”没有可行的替代项
  2. pairkey akka.persistence

导致错误的示例输入是:

  1. akka.persistence {
  2. journal {
  3. # Absolute path to the journal plugin configuration entry used by
  4. # persistent actor or view by default.
  5. # Persistent actor or view can override `journalPluginId` method
  6. # in order to rely on a different journal plugin.
  7. plugin = ""
  8. }
  9. }

然而,如果我将其更新为使用带引号的字符串:

  1. akka.persistence {
  2. 'journal' {
  3. # Absolute path to the journal plugin configuration entry used by
  4. # persistent actor or view by default.
  5. # Persistent actor or view can override `journalPluginId` method
  6. # in order to rely on a different journal plugin.
  7. 'plugin' = ""
  8. }
  9. }

一切都按预期工作。

看起来我在KEY定义中漏掉了一些东西,但我无法确定具体是什么。

用于测试的Go代码如下:

  1. package main
  2. import (
  3. "github.com/antlr/antlr4/runtime/Go/antlr"
  4. "go-hocon/parser"
  5. )
  6. func main() {
  7. is, _ := antlr.NewFileStream("test/simple1.conf")
  8. lex := parser.NewHOCONLexer(is)
  9. p := parser.NewHOCONParser(antlr.NewCommonTokenStream(lex, 0))
  10. p.BuildParseTrees = true
  11. p.Hocon()
  12. }
英文:

I am trying to create a simple HOCON parser (started from the existing JSON one).

The grammar is defined as:

  1. /** Taken from "The Definitive ANTLR 4 Reference" by Terence Parr */
  2. // Derived from http://json.org
  3. grammar HOCON;
  4. hocon
  5. : value
  6. | pair
  7. ;
  8. obj
  9. : object_begin pair (','? pair)* object_end
  10. | object_begin object_end
  11. ;
  12. pair
  13. : STRING KV? value {fmt.Println("pairstr",$STRING.GetText())}
  14. | KEY KV? value {fmt.Println("pairkey",$KEY.GetText())}
  15. ;
  16. array
  17. : array_begin value (',' value)* array_end
  18. | array_begin array_end
  19. ;
  20. value
  21. : STRING {fmt.Println($STRING.GetText())}
  22. | REFERENCE {fmt.Println($REFERENCE.GetText())}
  23. | RAWSTRING {fmt.Println($RAWSTRING.GetText())}
  24. | NUMBER {fmt.Println($NUMBER.GetText())}
  25. | obj
  26. | array
  27. | 'true'
  28. | 'false'
  29. | 'null'
  30. ;
  31. COMMENT
  32. : '#' ~( '\r' | '\n' )* -> skip
  33. ;
  34. STRING
  35. : '"' (ESC | ~ ["\\])* '"'
  36. | '\'' (ESC | ~ ['\\])* '\''
  37. ;
  38. RAWSTRING
  39. : (ESC | ALPHANUM)+
  40. ;
  41. KEY
  42. : ( '.' | ALPHANUM | '-')+
  43. ;
  44. REFERENCE
  45. : '${' (ALPHANUM|'.')+ '}'
  46. ;
  47. fragment ESC
  48. : '\\' (["\\/bfnrt] | UNICODE)
  49. ;
  50. fragment UNICODE
  51. : 'u' HEX HEX HEX HEX
  52. ;
  53. fragment ALPHANUM
  54. : [0-9a-zA-Z]
  55. ;
  56. fragment HEX
  57. : [0-9a-fA-F]
  58. ;
  59. KV
  60. : [=:]
  61. ;
  62. array_begin
  63. : '[' { fmt.Println("BEGIN [") }
  64. ;
  65. array_end
  66. : ']' { fmt.Println("] END") }
  67. ;
  68. object_begin
  69. : '{' { fmt.Println("OBJ {") }
  70. ;
  71. object_end
  72. : '}' { fmt.Println("} OBJ") }
  73. ;
  74. NUMBER
  75. : '-'? INT '.' [0-9] + EXP? | '-'? INT EXP | '-'? INT
  76. ;
  77. fragment INT
  78. : '0' | [1-9] [0-9]*
  79. ;
  80. // no leading zeros
  81. fragment EXP
  82. : [Ee] [+\-]? INT
  83. ;
  84. // \- since - means "range" inside [...]
  85. WS
  86. : [ \t\n\r] + -> skip
  87. ;

the error is:

  1. line 2:2 no viable alternative at input '{journal'
  2. pairkey akka.persistence

the sample input that gives the error is:

  1. akka.persistence {
  2. journal {
  3. # Absolute path to the journal plugin configuration entry used by
  4. # persistent actor or view by default.
  5. # Persistent actor or view can override `journalPluginId` method
  6. # in order to rely on a different journal plugin.
  7. plugin = ""
  8. }
  9. }

however if I will update it to use quoted strings:

  1. akka.persistence {
  2. 'journal' {
  3. # Absolute path to the journal plugin configuration entry used by
  4. # persistent actor or view by default.
  5. # Persistent actor or view can override `journalPluginId` method
  6. # in order to rely on a different journal plugin.
  7. 'plugin' = ""
  8. }
  9. }

everything works as expected.

Looks like I miss something in the KEY definition, but I can't really find out what exactly.

The Go code to test it out is:

  1. package main
  2. import (
  3. "github.com/antlr/antlr4/runtime/Go/antlr"
  4. "go-hocon/parser"
  5. )
  6. func main() {
  7. is, _ := antlr.NewFileStream("test/simple1.conf")
  8. lex := parser.NewHOCONLexer(is)
  9. p := parser.NewHOCONParser(antlr.NewCommonTokenStream(lex, 0))
  10. p.BuildParseTrees = true
  11. p.Hocon()
  12. }

答案1

得分: 1

你的第一个输入将_journal_识别为RAWSTRING

  1. [@0,0:15='akka.persistence',<KEY>,1:0]
  2. [@1,17:17='{',<'{'>,1:17]
  3. [@2,22:28='journal',<RAWSTRING>,2:2]
  4. [@3,30:30='{',<'{'>,2:10]
  5. [@4,277:282='plugin',<RAWSTRING>,7:4]
  6. [@5,284:284='=',<KV>,7:11]
  7. [@6,286:287='""',<STRING>,7:13]
  8. [@7,292:292='}',<'}'>,8:2]
  9. [@8,295:295='}',<'}'>,9:0]
  10. [@9,298:297='<EOF>',<EOF>,10:0]
  11. 2行第2列的输入“{journal”没有可行的替代项。

另一方面,“'journal'”被识别为字符串,但是它有单引号,显然你不想要这些单引号:

  1. [@0,0:15='akka.persistence',<KEY>,1:0]
  2. [@1,17:17='{',<'{'>,1:17]
  3. [@2,22:30=''journal'',<STRING>,2:2] <-- 现在它是一个隐式字符串标记
  4. [@3,32:32='{',<'{'>,2:12]
  5. [@4,279:284='plugin',<RAWSTRING>,7:4]
  6. [@5,286:286='=',<KV>,7:11]
  7. [@6,288:289='""',<STRING>,7:13]
  8. [@7,294:294='}',<'}'>,8:2]
  9. [@8,297:297='}',<'}'>,9:0]
  10. [@9,300:299='<EOF>',<EOF>,10:0]
  11. 7行第4列的输入“{plugin”没有可行的替代项。
  12. 8行第2列的输入“}”与期望的{'true','false','null','[','{',STRINGRAWSTRINGREFERENCEKVNUMBER}不匹配。

为什么会这样呢?因为词法分析器规则按照以下方式绑定:

  1. 首先匹配最长的输入。
  2. 匹配隐式标记(如“journal”)。
  3. 如果输入长度相等,则根据词法分析器规则的顺序进行匹配。

在你的情况下,将'journal'放在那里_使其匹配为隐式标记_,所以它似乎工作正常。但是只有因为有了那些单引号,这使得它根据上述规则2进行匹配。如果没有引号,这两个标记将被匹配为RAWSTRING,这不符合规则

  1. pair
  2. : STRING KV? value //{fmt.Println("pairstr",$STRING.GetText())}

因此会出现错误。

如何修复?我反转了词法分析器规则:

  1. RAWSTRING
  2. : (ESC | ALPHANUM)+
  3. ;
  4. STRING
  5. : ''' (ESC | ~ ['\\])* '''
  6. | '\'' (ESC | ~ ['\\])* '\''
  7. ;

并修改了pair

  1. pair
  2. : RAWSTRING KV? value //{fmt.Println("pairstr",$STRING.GetText())}

现在它可以正确解析:

  1. [@0,0:15='akka.persistence',<KEY>,1:0]
  2. [@1,17:17='{',<'{'>,1:17]
  3. [@2,22:28='journal',<RAWSTRING>,2:2]
  4. [@3,30:30='{',<'{'>,2:10]
  5. [@4,277:282='plugin',<RAWSTRING>,7:4]
  6. [@5,284:284='=',<KV>,7:11]
  7. [@6,286:287='""',<STRING>,7:13]
  8. [@7,292:292='}',<'}'>,8:2]
  9. [@8,295:295='}',<'}'>,9:0]
  10. [@9,298:297='<EOF>',<EOF>,10:0]
英文:

Your first input makes journal lex as a RAWSTRING.

  1. [@0,0:15='akka.persistence',<KEY>,1:0]
  2. [@1,17:17='{',<'{'>,1:17]
  3. [@2,22:28='journal',<RAWSTRING>,2:2]
  4. [@3,30:30='{',<'{'>,2:10]
  5. [@4,277:282='plugin',<RAWSTRING>,7:4]
  6. [@5,284:284='=',<KV>,7:11]
  7. [@6,286:287='""',<STRING>,7:13]
  8. [@7,292:292='}',<'}'>,8:2]
  9. [@8,295:295='}',<'}'>,9:0]
  10. [@9,298:297='<EOF>',<EOF>,10:0]
  11. line 2:2 no viable alternative at input '{journal'

On the other hand, 'journal' lexes as a string, but has those single quotes which you clearly don't want:

  1. [@0,0:15='akka.persistence',<KEY>,1:0]
  2. [@1,17:17='{',<'{'>,1:17]
  3. [@2,22:30=''journal'',<STRING>,2:2] <-- now it's a string implicit token
  4. [@3,32:32='{',<'{'>,2:12]
  5. [@4,279:284='plugin',<RAWSTRING>,7:4]
  6. [@5,286:286='=',<KV>,7:11]
  7. [@6,288:289='""',<STRING>,7:13]
  8. [@7,294:294='}',<'}'>,8:2]
  9. [@8,297:297='}',<'}'>,9:0]
  10. [@9,300:299='<EOF>',<EOF>,10:0]
  11. line 7:4 no viable alternative at input '{plugin'
  12. line 8:2 mismatched input '}' expecting {'true', 'false', 'null', '[', '{', STRING, RAWSTRING, REFERENCE, KV, NUMBER}

Why? Because lexer rules bind in the following way:

  1. Match longest input first.
  2. Match implicit tokens (like 'journal')
  3. If length of input match is equal, match based on the order of the lexer rules.

In your case, putting 'journal' makes it match as an implicit token, so it seems to work okay. But only because of those single quotes, which makes it match per rule 2 above Without the quotes, these two tokens are being matched as RAWSTRING, which doesn't fit the rule

  1. pair
  2. : STRING KV? value //{fmt.Println("pairstr",$STRING.GetText())}

Hence the error.

How to fix? Well, I reversed the lexer rules:

  1. RAWSTRING
  2. : (ESC | ALPHANUM)+
  3. ;
  4. STRING
  5. : '"' (ESC | ~ ["\\])* '"'
  6. | '\'' (ESC | ~ ['\\])* '\''
  7. ;

And changed pair:

  1. pair
  2. : RAWSTRING KV? value //{fmt.Println("pairstr",$STRING.GetText())}

Now it parses fine:

  1. [@0,0:15='akka.persistence',<KEY>,1:0]
  2. [@1,17:17='{',<'{'>,1:17]
  3. [@2,22:28='journal',<RAWSTRING>,2:2]
  4. [@3,30:30='{',<'{'>,2:10]
  5. [@4,277:282='plugin',<RAWSTRING>,7:4]
  6. [@5,284:284='=',<KV>,7:11]
  7. [@6,286:287='""',<STRING>,7:13]
  8. [@7,292:292='}',<'}'>,8:2]
  9. [@8,295:295='}',<'}'>,9:0]
  10. [@9,298:297='<EOF>',<EOF>,10:0]

huangapple
  • 本文由 发表于 2017年7月7日 10:37:21
  • 转载请务必保留本文链接:https://go.coder-hub.com/44961694.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定