如何创建一个解析器

huangapple go评论177阅读模式
英文:

How to create a parser

问题

我想构建一个解析器,但是在理解如何做到这一点方面遇到了一些问题。

我想解析的示例字符串如下:

  1. {key1 = value1 | key2 = {key3 = value3} | key4 = {key5 = { key6 = value6 }}}

最好能够得到类似嵌套映射的输出:

  1. map[key1] = value1
  2. map[key2] = (map[key3] = value3)
  3. map[key4] = (map[key5] = (map[key6] = value6))

如何实现这个功能?我是否朝着错误的方向努力了?

英文:

I want to build a parser but have some problems understanding how to do this.

Sample string I would like to parse

  1. {key1 = value1 | key2 = {key3 = value3} | key4 = {key5 = { key6 = value6 }}}

Preferably I would like to get an output similar to a nested map

  1. map[key1] = value1
  2. map[key2] = (map[key3] = value3)
  3. map[key4] = (map[key5] = (map[key6] = value6))

How could this be done? Am I aiming in the wrong direction?

答案1

得分: 36

编写解析器是一个复杂的主题,无法在一个单独的答案中涵盖。

Rob Pike在Go语言中给出了一个出色的演讲,介绍了如何编写词法分析器(解析器的一半):http://www.youtube.com/watch?v=HxaD_trXwRE

你还可以查看Go标准库中的解析器代码,以了解如何实现:http://golang.org/src/pkg/go/parser/parser.go

互联网上也有很多关于解析的资源。它们可能使用其他语言的示例,但只需要将语法转换为Go语言即可。

我建议阅读递归下降解析(例如http://www.cs.binghamton.edu/~zdu/parsdemo/recintro.html)或自顶向下解析(例如http://javascript.crockford.com/tdop/tdop.html,http://effbot.org/zone/simple-top-down-parsing.htm)的相关资料。

英文:

Writing a parser is a complicated topic that is too big to cover in a single answer.

Rob Pike gave an excellent talk that walks through writing a lexer (which is a half of the parser) in Go: http://www.youtube.com/watch?v=HxaD_trXwRE

You should also look at e.g. parser code in Go standard library for an example on how to do it: http://golang.org/src/pkg/go/parser/parser.go

There's also plenty resources on parsing on the internet. They might have examples in other languages but it's just a matter of translating the syntax to Go.

I recommend reading up on recursive descent parsing (e.g. http://www.cs.binghamton.edu/~zdu/parsdemo/recintro.html) or top down parsing (e.g. http://javascript.crockford.com/tdop/tdop.html, http://effbot.org/zone/simple-top-down-parsing.htm).

答案2

得分: 34

main.y

  1. %{
  2. package main
  3. import (
  4. "fmt"
  5. "log"
  6. )
  7. %}
  8. %union{
  9. tok int
  10. val interface{}
  11. pair struct{key, val interface{}}
  12. pairs map[interface{}]interface{}
  13. }
  14. %token KEY
  15. %token VAL
  16. %type <val> KEY VAL
  17. %type <pair> pair
  18. %type <pairs> pairs
  19. %%
  20. goal:
  21. '{' pairs '}'
  22. {
  23. yylex.(*lex).m = $2
  24. }
  25. pairs:
  26. pair
  27. {
  28. $$ = map[interface{}]interface{}{$1.key: $1.val}
  29. }
  30. | pairs '|' pair
  31. {
  32. $$[$3.key] = $3.val
  33. }
  34. pair:
  35. KEY '=' VAL
  36. {
  37. $$.key, $$.val = $1, $3
  38. }
  39. | KEY '=' '{' pairs '}'
  40. {
  41. $$.key, $$.val = $1, $4
  42. }
  43. %%
  44. type token struct {
  45. tok int
  46. val interface{}
  47. }
  48. type lex struct {
  49. tokens []token
  50. m map[interface{}]interface{}
  51. }
  52. func (l *lex) Lex(lval *yySymType) int {
  53. if len(l.tokens) == 0 {
  54. return 0
  55. }
  56. v := l.tokens[0]
  57. l.tokens = l.tokens[1:]
  58. lval.val = v.val
  59. return v.tok
  60. }
  61. func (l *lex) Error(e string) {
  62. log.Fatal(e)
  63. }
  64. func main() {
  65. l := &lex{
  66. // {key1 = value1 | key2 = {key3 = value3} | key4 = {key5 = { key6 = value6 }}}
  67. []token{
  68. {'{', ""},
  69. {KEY, "key1"},
  70. {'=', ""},
  71. {VAL, "value1"},
  72. {'|', ""},
  73. {KEY, "key2"},
  74. {'=', ""},
  75. {'{', ""},
  76. {KEY, "key3"},
  77. {'=', ""},
  78. {VAL, "value3"},
  79. {'}', ""},
  80. {'|', ""},
  81. {KEY, "key4"},
  82. {'=', ""},
  83. {'{', ""},
  84. {KEY, "key5"},
  85. {'=', ""},
  86. {'{', ""},
  87. {KEY, "key6"},
  88. {'=', ""},
  89. {VAL, "value6"},
  90. {'}', ""},
  91. {'}', ""},
  92. {'}', ""},
  93. },
  94. map[interface{}]interface{}{},
  95. }
  96. yyParse(l)
  97. fmt.Println(l.m)
  98. }

Output

  1. $ go tool yacc -o main.go main.y && go run main.go
  2. map[key4:map[key5:map[key6:value6]] key1:value1 key2:map[key3:value3]]
  3. $
英文:

What about using the standard goyacc tool? Here is a skeleton:

main.y

  1. %{
  2. package main
  3. import (
  4. &quot;fmt&quot;
  5. &quot;log&quot;
  6. )
  7. %}
  8. %union{
  9. tok int
  10. val interface{}
  11. pair struct{key, val interface{}}
  12. pairs map[interface{}]interface{}
  13. }
  14. %token KEY
  15. %token VAL
  16. %type &lt;val&gt; KEY VAL
  17. %type &lt;pair&gt; pair
  18. %type &lt;pairs&gt; pairs
  19. %%
  20. goal:
  21. &#39;{&#39; pairs &#39;}&#39;
  22. {
  23. yylex.(*lex).m = $2
  24. }
  25. pairs:
  26. pair
  27. {
  28. $$ = map[interface{}]interface{}{$1.key: $1.val}
  29. }
  30. | pairs &#39;|&#39; pair
  31. {
  32. $$[$3.key] = $3.val
  33. }
  34. pair:
  35. KEY &#39;=&#39; VAL
  36. {
  37. $$.key, $$.val = $1, $3
  38. }
  39. | KEY &#39;=&#39; &#39;{&#39; pairs &#39;}&#39;
  40. {
  41. $$.key, $$.val = $1, $4
  42. }
  43. %%
  44. type token struct {
  45. tok int
  46. val interface{}
  47. }
  48. type lex struct {
  49. tokens []token
  50. m map[interface{}]interface{}
  51. }
  52. func (l *lex) Lex(lval *yySymType) int {
  53. if len(l.tokens) == 0 {
  54. return 0
  55. }
  56. v := l.tokens[0]
  57. l.tokens = l.tokens[1:]
  58. lval.val = v.val
  59. return v.tok
  60. }
  61. func (l *lex) Error(e string) {
  62. log.Fatal(e)
  63. }
  64. func main() {
  65. l := &amp;lex{
  66. // {key1 = value1 | key2 = {key3 = value3} | key4 = {key5 = { key6 = value6 }}}
  67. []token{
  68. {&#39;{&#39;, &quot;&quot;},
  69. {KEY, &quot;key1&quot;},
  70. {&#39;=&#39;, &quot;&quot;},
  71. {VAL, &quot;value1&quot;},
  72. {&#39;|&#39;, &quot;&quot;},
  73. {KEY, &quot;key2&quot;},
  74. {&#39;=&#39;, &quot;&quot;},
  75. {&#39;{&#39;, &quot;&quot;},
  76. {KEY, &quot;key3&quot;},
  77. {&#39;=&#39;, &quot;&quot;},
  78. {VAL, &quot;value3&quot;},
  79. {&#39;}&#39;, &quot;&quot;},
  80. {&#39;|&#39;, &quot;&quot;},
  81. {KEY, &quot;key4&quot;},
  82. {&#39;=&#39;, &quot;&quot;},
  83. {&#39;{&#39;, &quot;&quot;},
  84. {KEY, &quot;key5&quot;},
  85. {&#39;=&#39;, &quot;&quot;},
  86. {&#39;{&#39;, &quot;&quot;},
  87. {KEY, &quot;key6&quot;},
  88. {&#39;=&#39;, &quot;&quot;},
  89. {VAL, &quot;value6&quot;},
  90. {&#39;}&#39;, &quot;&quot;},
  91. {&#39;}&#39;, &quot;&quot;},
  92. {&#39;}&#39;, &quot;&quot;},
  93. },
  94. map[interface{}]interface{}{},
  95. }
  96. yyParse(l)
  97. fmt.Println(l.m)
  98. }

Output

  1. $ go tool yacc -o main.go main.y &amp;&amp; go run main.go
  2. map[key4:map[key5:map[key6:value6]] key1:value1 key2:map[key3:value3]]
  3. $

答案3

得分: 5

请注意,在Go 1.8中(目前在2016年第四季度测试版中,2017年第一季度发布)

> yacc工具(以前通过运行“go tool yacc”可用)已被移除
从Go 1.7开始,它不再被Go编译器使用。

> 它已经移动到“tools”存储库,并且现在可以在golang.org/x/tools/cmd/goyacc上获得。

英文:

Be advised that, with Go 1.8 (currently in beta in Q4 2016, released in Q1 2017)

> The yacc tool (previously available by running “go tool yacc”) has been removed.
As of Go 1.7 it was no longer used by the Go compiler.

> It has moved to the “tools” repository and is now available at golang.org/x/tools/cmd/goyacc.

答案4

得分: 2

var txt = {key1 = &quot;\&quot;value1\&quot;\n&quot; | key2 = { key3 = 10 } | key4 = {key5 = { key6 = value6}}}
var s scanner.Scanner
s.Init(strings.NewReader(txt))
var b []byte

loop:
for {
switch tok := s.Scan(); tok {
case scanner.EOF:
break loop
case '|':
b = append(b, ',')
case '=':
b = append(b, ':')
case scanner.Ident:
b = append(b, strconv.Quote(s.TokenText())...)
default:
b = append(b, s.TokenText()...)
}
}

var m map[string]interface{}
err := json.Unmarshal(b, &m)
if err != nil {
// handle error
}

fmt.Printf("%#v\n",m)

英文:

That particular format is very similar to json. You could use the following code to leverage that similarity:

  1. var txt = `{key1 = &quot;\&quot;value1\&quot;\n&quot; | key2 = { key3 = 10 } | key4 = {key5 = { key6 = value6}}}`
  2. var s scanner.Scanner
  3. s.Init(strings.NewReader(txt))
  4. var b []byte
  5. loop:
  6. for {
  7. switch tok := s.Scan(); tok {
  8. case scanner.EOF:
  9. break loop
  10. case &#39;|&#39;:
  11. b = append(b, &#39;,&#39;)
  12. case &#39;=&#39;:
  13. b = append(b, &#39;:&#39;)
  14. case scanner.Ident:
  15. b = append(b, strconv.Quote(s.TokenText())...)
  16. default:
  17. b = append(b, s.TokenText()...)
  18. }
  19. }
  20. var m map[string]interface{}
  21. err := json.Unmarshal(b, &amp;m)
  22. if err != nil {
  23. // handle error
  24. }
  25. fmt.Printf(&quot;%#v\n&quot;,m)

答案5

得分: 1

你想尝试一下golang版本的parsec吗?我写了一个goparsec的rune(用于unicode)分支,链接是https://github.com/Dwarfartisan/goparsec。

Haskell的parsec是一个用于创建解析器的强大工具。第一个名为pugs的perl6解析器就是用它编写的。我的golang版本不像yacc那么简单,但比yacc更容易使用。

对于这个示例,我编写了以下代码:

parser.go

  1. package main
  2. import (
  3. "fmt"
  4. psc "github.com/Dwarfartisan/goparsec"
  5. )
  6. type kv struct {
  7. key string
  8. value interface{}
  9. }
  10. var tchar = psc.NoneOf("|{}= ")
  11. func escaped(st psc.ParseState) (interface{}, error) {
  12. _, err := psc.Try(psc.Rune('\\'))(st)
  13. if err == nil {
  14. r, err := psc.AnyRune(st)
  15. if err == nil {
  16. switch r.(rune) {
  17. case 't':
  18. return '\t', nil
  19. case '"':
  20. return '"', nil
  21. case 'n':
  22. return '\n', nil
  23. case '\\':
  24. return '\\', nil
  25. default:
  26. return nil, st.Trap("Unknown escape \\%r", r)
  27. }
  28. } else {
  29. return nil, err
  30. }
  31. } else {
  32. return psc.NoneOf("\"")(st)
  33. }
  34. }
  35. var token = psc.Either(
  36. psc.Between(psc.Rune('"'), psc.Rune('"'),
  37. psc.Try(psc.Bind(psc.Many1(escaped), psc.ReturnString))),
  38. psc.Bind(psc.Many1(tchar), psc.ReturnString))
  39. // rune with skip spaces
  40. func syms(r rune) psc.Parser {
  41. return func(st psc.ParseState) (interface{}, error) {
  42. _, err := psc.Bind_(psc.Bind_(psc.Many(psc.Space), psc.Rune(r)), psc.Many(psc.Space))(st)
  43. if err == nil {
  44. return r, nil
  45. } else {
  46. return nil, err
  47. }
  48. }
  49. }
  50. var lbracket = syms('{')
  51. var rbracket = syms('}')
  52. var eql = syms('=')
  53. var vbar = syms('|')
  54. func pair(st psc.ParseState) (interface{}, error) {
  55. left, err := token(st)
  56. if err != nil {
  57. return nil, err
  58. }
  59. right, err := psc.Bind_(eql, psc.Either(psc.Try(token), mapExpr))(st)
  60. if err != nil {
  61. return nil, err
  62. }
  63. return kv{left.(string), right}, nil
  64. }
  65. func pairs(st psc.ParseState) (interface{}, error) {
  66. return psc.SepBy1(pair, vbar)(st)
  67. }
  68. func mapExpr(st psc.ParseState) (interface{}, error) {
  69. p, err := psc.Try(psc.Between(lbracket, rbracket, pair))(st)
  70. if err == nil {
  71. return p, nil
  72. }
  73. ps, err := psc.Between(lbracket, rbracket, pairs)(st)
  74. if err == nil {
  75. return ps, nil
  76. } else {
  77. return nil, err
  78. }
  79. }
  80. func makeMap(data interface{}) interface{} {
  81. ret := make(map[string]interface{})
  82. switch val := data.(type) {
  83. case kv:
  84. ret[val.key] = makeMap(val.value)
  85. case string:
  86. return data
  87. case []interface{}:
  88. for _, item := range val {
  89. it := item.(kv)
  90. ret[it.key] = makeMap(it.value)
  91. }
  92. }
  93. return ret
  94. }
  95. func main() {
  96. input := `{key1 = "value1\n" | key2 = { key3 = 10 } | key4 = {key5 = { key6 = value6}}}`
  97. st := psc.MemoryParseState(input)
  98. ret, err := mapExpr(makeMap(st))
  99. if err == nil {
  100. fmt.Println(ret)
  101. } else {
  102. fmt.Println(err)
  103. }
  104. }

运行

  1. go run parser.go

输出

  1. map[key1:"value1"
  2. key2:map[key3:10] key4:map[key5:map[key6:value6]]]

这个示例包括了转义、token、字符串和键值对映射。你可以将它作为包或应用程序创建一个解析器。

英文:

Would you like try to parsec for golang edition? I write a rune(for unicode) fork of goparsec(https://github.com/sanyaade-buildtools/goparsec) what is https://github.com/Dwarfartisan/goparsec .

Haskell parsec is a power tools for make parser. The first perl6 parser named pugs was written by it. My golang Edition is not simple than yacc, but it is easier than yacc.

For this example, I wrote code as this:

parser.go

  1. package main
  2. import (
  3. &quot;fmt&quot;
  4. psc &quot;github.com/Dwarfartisan/goparsec&quot;
  5. )
  6. type kv struct {
  7. key string
  8. value interface{}
  9. }
  10. var tchar = psc.NoneOf(&quot;|{}= &quot;)
  11. func escaped(st psc.ParseState) (interface{}, error) {
  12. _, err := psc.Try(psc.Rune(&#39;\\&#39;))(st)
  13. if err == nil {
  14. r, err := psc.AnyRune(st)
  15. if err == nil {
  16. switch r.(rune) {
  17. case &#39;t&#39;:
  18. return &#39;\t&#39;, nil
  19. case &#39;&quot;&#39;:
  20. return &#39;&quot;&#39;, nil
  21. case &#39;n&#39;:
  22. return &#39;\n&#39;, nil
  23. case &#39;\\&#39;:
  24. return &#39;\\&#39;, nil
  25. default:
  26. return nil, st.Trap(&quot;Unknown escape \\%r&quot;, r)
  27. }
  28. } else {
  29. return nil, err
  30. }
  31. } else {
  32. return psc.NoneOf(&quot;\&quot;&quot;)(st)
  33. }
  34. }
  35. var token = psc.Either(
  36. psc.Between(psc.Rune(&#39;&quot;&#39;), psc.Rune(&#39;&quot;&#39;),
  37. psc.Try(psc.Bind(psc.Many1(escaped), psc.ReturnString))),
  38. psc.Bind(psc.Many1(tchar), psc.ReturnString))
  39. // rune with skip spaces
  40. func syms(r rune) psc.Parser {
  41. return func(st psc.ParseState) (interface{}, error) {
  42. _, err := psc.Bind_(psc.Bind_(psc.Many(psc.Space), psc.Rune(r)), psc.Many(psc.Space))(st)
  43. if err == nil {
  44. return r, nil
  45. } else {
  46. return nil, err
  47. }
  48. }
  49. }
  50. var lbracket = syms(&#39;{&#39;)
  51. var rbracket = syms(&#39;}&#39;)
  52. var eql = syms(&#39;=&#39;)
  53. var vbar = syms(&#39;|&#39;)
  54. func pair(st psc.ParseState) (interface{}, error) {
  55. left, err := token(st)
  56. if err != nil {
  57. return nil, err
  58. }
  59. right, err := psc.Bind_(eql, psc.Either(psc.Try(token), mapExpr))(st)
  60. if err != nil {
  61. return nil, err
  62. }
  63. return kv{left.(string), right}, nil
  64. }
  65. func pairs(st psc.ParseState) (interface{}, error) {
  66. return psc.SepBy1(pair, vbar)(st)
  67. }
  68. func mapExpr(st psc.ParseState) (interface{}, error) {
  69. p, err := psc.Try(psc.Between(lbracket, rbracket, pair))(st)
  70. if err == nil {
  71. return p, nil
  72. }
  73. ps, err := psc.Between(lbracket, rbracket, pairs)(st)
  74. if err == nil {
  75. return ps, nil
  76. } else {
  77. return nil, err
  78. }
  79. }
  80. func makeMap(data interface{}) interface{} {
  81. ret := make(map[string]interface{})
  82. switch val := data.(type) {
  83. case kv:
  84. ret[val.key] = makeMap(val.value)
  85. case string:
  86. return data
  87. case []interface{}:
  88. for _, item := range val {
  89. it := item.(kv)
  90. ret[it.key] = makeMap(it.value)
  91. }
  92. }
  93. return ret
  94. }
  95. func main() {
  96. input := `{key1 = &quot;\&quot;value1\&quot;\n&quot; | key2 = { key3 = 10 } | key4 = {key5 = { key6 = value6}}}`
  97. st := psc.MemoryParseState(input)
  98. ret, err := mapExpr(makeMap(st))
  99. if err == nil {
  100. fmt.Println(ret)
  101. } else {
  102. fmt.Println(err)
  103. }
  104. }

RUN

  1. go run parser.go

OUTPUT

  1. map[key1:&quot;value1&quot;
  2. key2:map[key3:10] key4:map[key5:map[key6:value6]]]

This demo include escape, token, string and key/value map. You can create a parser as package or application.

答案6

得分: 1

如果您愿意将输入转换为标准的JSON格式,为什么还要创建一个解析器,而不是使用Go库来完成繁重的工作呢?

给定以下输入文件(/Users/lex/dev/go/data/jsoncfgo/fritjof.json):

输入文件

  1. {
  2. "key1": "value1",
  3. "key2" : {
  4. "key3": "value3"
  5. },
  6. "key4": {
  7. "key5": {
  8. "key6": "value6"
  9. }
  10. }
  11. }

代码示例

  1. package main
  2. import (
  3. "fmt"
  4. "log"
  5. "github.com/l3x/jsoncfgo"
  6. )
  7. func main() {
  8. configPath := "/Users/lex/dev/go/data/jsoncfgo/fritjof.json"
  9. cfg, err := jsoncfgo.ReadFile(configPath)
  10. if err != nil {
  11. log.Fatal(err.Error()) // 在这里处理错误
  12. }
  13. key1 := cfg.RequiredString("key1")
  14. fmt.Printf("key1: %v\n\n", key1)
  15. key2 := cfg.OptionalObject("key2")
  16. fmt.Printf("key2: %v\n\n", key2)
  17. key4 := cfg.OptionalObject("key4")
  18. fmt.Printf("key4: %v\n\n", key4)
  19. if err := cfg.Validate(); err != nil {
  20. defer log.Fatalf("ERROR - 无效的配置文件...\n%v", err)
  21. return
  22. }
  23. }

输出

  1. key1: value1
  2. key2: map[key3:value3]
  3. key4: map[key5:map[key6:value6]]

备注

jsoncfgo可以处理任意层级的嵌套JSON对象。

详细信息请参见:

英文:

If you are willing to convert your input to a standard JSON format, why create a parser when there are Go libraries that do the heavy lifting for you?

Given the following input file (/Users/lex/dev/go/data/jsoncfgo/fritjof.json):

Input File

  1. {
  2. &quot;key1&quot;: &quot;value1&quot;,
  3. &quot;key2&quot; : {
  4. &quot;key3&quot;: &quot;value3&quot;
  5. },
  6. &quot;key4&quot;: {
  7. &quot;key5&quot;: {
  8. &quot;key6&quot;: &quot;value6&quot;
  9. }
  10. }
  11. }

Code Example

  1. package main
  2. import (
  3. &quot;fmt&quot;
  4. &quot;log&quot;
  5. &quot;github.com/l3x/jsoncfgo&quot;
  6. )
  7. func main() {
  8. configPath := &quot;/Users/lex/dev/go/data/jsoncfgo/fritjof.json&quot;
  9. cfg, err := jsoncfgo.ReadFile(configPath)
  10. if err != nil {
  11. log.Fatal(err.Error()) // Handle error here
  12. }
  13. key1 := cfg.RequiredString(&quot;key1&quot;)
  14. fmt.Printf(&quot;key1: %v\n\n&quot;, key1)
  15. key2 := cfg.OptionalObject(&quot;key2&quot;)
  16. fmt.Printf(&quot;key2: %v\n\n&quot;, key2)
  17. key4 := cfg.OptionalObject(&quot;key4&quot;)
  18. fmt.Printf(&quot;key4: %v\n\n&quot;, key4)
  19. if err := cfg.Validate(); err != nil {
  20. defer log.Fatalf(&quot;ERROR - Invalid config file...\n%v&quot;, err)
  21. return
  22. }
  23. }

Output

  1. key1: value1
  2. key2: map[key3:value3]
  3. key4: map[key5:map[key6:value6]]

Notes

jsoncfgo can handle any level of nested JSON objects.

For details see:

huangapple
  • 本文由 发表于 2011年12月8日 04:34:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/8422146.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定