如何为bufio.Scanner编写自定义的splitFunc函数以扫描JSON对象

huangapple go评论97阅读模式
英文:

How to write custom splitFunc for bufio.Scaner that scan json objects

问题

我有一个类似这样的代码:

scanner := bufio.NewScanner(reader)
scanner.Split(splitJSON)

for scanner.Scan() {
    bb := scanner.Bytes()
}

我想一次只获取来自Scanner的有效的JSON对象。在某些情况下,Scanner中可能包含表示如下结构的字节:

{
    "some_object": "name",
    "some_fields": {}
}
{
    "some_object": 
}

我只需要第一部分:

{
    "some_object": "name",
    "some_fields": {}
}

对于其他部分,我应该等待JSON对象的结束。

我有一个像这样的函数,但它很糟糕并且不起作用。

func splitJSON(bb []byte, atEOF bool) (advance int, token []byte, err error) {
    print(string(bb))
    if len(bb) < 10 {
        return 0, nil, nil
    }

    var nested, from, to int
    var end bool

    for i, b := range bb {
        if string(b) == "{" {
            if end {
                to = i
                break
            }

            if nested == 0 {
                from = i
            }

            nested++
        }

        if string(b) == "}" {
            nested--
            if nested == 0 {
                to = i
                end = true
            }
        }
    }

    if atEOF {
        return len(bb), bb, nil
    }

    return len(bb[from:to]), bb[from:to], nil
}

更新:
通过以下splitFunc进行了更改决定:

func splitJSON(data []byte, atEOF bool) (advance int, token []byte, err error) {
    if atEOF && len(data) == 0 {
        return 0, nil, nil
    }

    reader := bytes.NewReader(data)
    dec := json.NewDecoder(reader)

    var raw json.RawMessage
    if err := dec.Decode(&raw); err != nil {
        return 0, nil, nil
    }

    return len(raw) + 1, raw, nil
}
英文:

I have a code like this

scanner := bufio.NewScanner(reader)
scanner.Split(splitJSON)

for scanner.Scan() {
	bb := scanner.Bytes()
}

I would like to get from Scanner only valid JSON objects one at a time. In some case in Scanner may be bytes that represent struct like this

{
    &quot;some_object&quot;: &quot;name&quot;,
    &quot;some_fileds&quot;: {}
}
{
    &quot;some_object&quot;: 
}

I need only the first part of this

{
    &quot;some_object&quot;: &quot;name&quot;,
    &quot;some_fileds&quot;: {}
}

For the other, I should wait for the end of JSON object.

I have a function like this, but it's horrible and doesn't work.

func splitJSON(
	bb []byte, atEOF bool,
) (advance int, token []byte, err error) {
	print(string(bb))
	if len(bb) &lt; 10 {
		return 0, nil, nil
	}

	var nested, from, to int
	var end bool

	for i, b := range bb {
		if string(b) == &quot;{&quot; {
			if end {
				to = i

				break
			}

			if nested == 0 {
				from = i
			}

			nested++
		}

		if string(b) == &quot;}&quot; {
			nested--
			if nested == 0 {
				to = i
				end = true
			}
		}
	}

	if atEOF {
		return len(bb), bb, nil
	}

	return len(bb[from:to]), bb[from:to], nil
}

UPD
It was decided by this splitFunc

func splitJSON(data []byte, atEOF bool) (advance int, token []byte, err error) {
	if atEOF &amp;&amp; len(data) == 0 {
		return 0, nil, nil
	}

	reader := bytes.NewReader(data)
	dec := json.NewDecoder(reader)

	var raw json.RawMessage
	if err := dec.Decode(&amp;raw); err != nil {
		return 0, nil, nil
	}

	return len(raw) + 1, raw, nil
}

答案1

得分: 1

使用json.Decoder来实现。每次调用Decoder.Decode()都会从输入中解码下一个JSON编码的值,对于你的情况是JSON对象。

如果你不想解码JSON对象,只需要JSON数据(字节切片),可以使用json.RawMessage进行解组。

例如:

func main() {
    reader := strings.NewReader(src)
    dec := json.NewDecoder(reader)

    for {
        var raw json.RawMessage
        if err := dec.Decode(&raw); err != nil {
            if err == io.EOF {
                break
            }
            fmt.Printf("Error:", err)
            return
        }
        fmt.Println("Next:", string(raw))
    }
}

const src = `{
    "some_object": "name",
    "some_fileds": {}
}
{
    "some_object": "foo"
}`

这将输出(在Go Playground上尝试):

Next: {
    "some_object": "name",
    "some_fileds": {}
}
Next: {
    "some_object": "foo"
}
英文:

Use json.Decoder for this. Each Decoder.Decode() call will decode the next JSON-encoded value from the input, JSON objects in your case.

If you don't want to decode the JSON objects just need the JSON data (byte slice), use a json.RawMessage to unmarshal into.

For example:

func main() {
	reader := strings.NewReader(src)
	dec := json.NewDecoder(reader)

	for {
		var raw json.RawMessage
		if err := dec.Decode(&amp;raw); err != nil {
			if err == io.EOF {
				break
			}
			fmt.Printf(&quot;Error:&quot;, err)
			return
		}
		fmt.Println(&quot;Next:&quot;, string(raw))
	}
}

const src = `{
    &quot;some_object&quot;: &quot;name&quot;,
    &quot;some_fileds&quot;: {}
}
{
    &quot;some_object&quot;: &quot;foo&quot;
}`

This will output (try it on the Go Playground):

Next: {
    &quot;some_object&quot;: &quot;name&quot;,
    &quot;some_fileds&quot;: {}
}
Next: {
    &quot;some_object&quot;: &quot;foo&quot;
}

huangapple
  • 本文由 发表于 2022年9月20日 19:43:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/73786151.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定