英文:
Unmarshal a json stream (not newline-separated)
问题
我想将一个 JSON 流转换为对象流。使用换行分隔的 JSON 可以很容易地实现这一点。参考 Go 文档:https://golang.org/pkg/encoding/json/#Decoder.Buffered
然而,我需要从像这样的 JSON 数组生成一个流:
[{"Name": "Ed", "Text": "Knock knock."},
{"Name": "Sam", "Text": "Who's there?"},
{"Name": "Ed", "Text": "Go fmt."},
{"Name": "Sam", "Text": "Go fmt who?"},
{"Name": "Ed", "Text": "Go fmt yourself!"}]
有没有一种高效的方法来实现这个?
我考虑过以下方法:
- 去掉外部的尖括号
- 当有匹配的顶层花括号时,解析括号之间(包括括号本身)的字符串,以获取一个顶层对象。
我不想这样做,因为每次扫描字符串的每个部分会带来性能上的影响。
我能想到的最好的替代方法是复制 Golang encoding/json 包中解码器的源代码,并修改它以返回一个逐个输出对象的 Reader。但是这对于一个如此简单的需求来说似乎太麻烦了。
有没有更好的方法来解码一个 JSON 数组流?
编辑
我希望解析具有嵌套对象和任意结构的 JSON。
英文:
I want to turn a stream of JSON into a stream of objects. This is easy to do with newline-separated JSON. From the Go docs: https://golang.org/pkg/encoding/json/#Decoder.Buffered
However, I need to generate a stream from JSON arrays like this one:
[{"Name": "Ed", "Text": "Knock knock."},
{"Name": "Sam", "Text": "Who's there?"},
{"Name": "Ed", "Text": "Go fmt."},
{"Name": "Sam", "Text": "Go fmt who?"},
{"Name": "Ed", "Text": "Go fmt yourself!"}]
What is a performant way to do this?
I've considered this method:
- Drop the outside angle brackets
- When there are matching top-level curly braces, unmarshal the string between the braces (inclusive) to get one top-level object at a time.
I don't want to do it because of the performance implications of scanning each portion of the string twice.
The best alternative I can thing of is to copy the source code for the decoder in the Golang encoding/json package and modify it so it returns a Reader that spits out one object at a time. But that seems like too much work for such a simple requirement.
Is there a better way to decode a stream that is a JSON array?
EDIT
I'm looking to parse JSON with nested objects and arbitrary structure.
答案1
得分: 1
你可以使用流式解析器。例如megajson的扫描器:
package main
import (
"fmt"
"strings"
"github.com/benbjohnson/megajson/scanner"
)
func main() {
// 我们的输入数据
rdr := strings.NewReader(`[
{"Name": "Ed", "Text": "Knock knock."},
{"Name": "Sam", "Text": "Who's there?"},
{"Name": "Ed", "Text": "Go fmt."},
{"Name": "Sam", "Text": "Go fmt who?"},
{"Name": "Ed", "Text": "Go fmt yourself!"}
]`)
// 我们想要创建一个这样的列表
type Object struct {
Name string
Text string
}
objects := make([]Object, 0)
// 在读取时扫描JSON
s := scanner.NewScanner(rdr)
// 这是我们跟踪JSON解析位置的方式
// 如果你需要支持嵌套对象,你需要在这里使用一个栈([]state{}),并在每次看到一个大括号时推入/弹出
var state struct {
inKey bool
lastKey string
object Object
}
for {
tok, data, err := s.Scan()
if err != nil {
break
}
switch tok {
case scanner.TLBRACE:
// 刚刚看到'{',所以开始一个新对象
state.inKey = true
state.lastKey = ""
state.object = Object{}
case scanner.TRBRACE:
// 刚刚看到'}',所以存储对象
objects = append(objects, state.object)
case scanner.TSTRING:
// 对于`key: value`,我们刚刚解析了'key'
if state.inKey {
state.lastKey = string(data)
} else {
// 现在我们在'value'上
if state.lastKey == "Name" {
state.object.Name = string(data)
} else {
state.object.Text = string(data)
}
}
state.inKey = !state.inKey
}
}
fmt.Println(objects)
}
这可能是你能得到的最高效的方法,但它需要大量的手动处理。
英文:
You can use a streaming parser. For example megajson's scanner:
package main
import (
"fmt"
"strings"
"github.com/benbjohnson/megajson/scanner"
)
func main() {
// our incoming data
rdr := strings.NewReader(`[
{"Name": "Ed", "Text": "Knock knock."},
{"Name": "Sam", "Text": "Who's there?"},
{"Name": "Ed", "Text": "Go fmt."},
{"Name": "Sam", "Text": "Go fmt who?"},
{"Name": "Ed", "Text": "Go fmt yourself!"}
]`)
// we want to create a list of these
type Object struct {
Name string
Text string
}
objects := make([]Object, 0)
// scan the JSON as we read
s := scanner.NewScanner(rdr)
// this is how we keep track of where we are parsing the JSON
// if you needed to support nested objects you would need to
// use a stack here ([]state{}) and push / pop each time you
// see a brace
var state struct {
inKey bool
lastKey string
object Object
}
for {
tok, data, err := s.Scan()
if err != nil {
break
}
switch tok {
case scanner.TLBRACE:
// just saw '{' so start a new object
state.inKey = true
state.lastKey = ""
state.object = Object{}
case scanner.TRBRACE:
// just saw '}' so store the object
objects = append(objects, state.object)
case scanner.TSTRING:
// for `key: value`, we just parsed 'key'
if state.inKey {
state.lastKey = string(data)
} else {
// now we are on `value`
if state.lastKey == "Name" {
state.object.Name = string(data)
} else {
state.object.Text = string(data)
}
}
state.inKey = !state.inKey
}
}
fmt.Println(objects)
}
This is probably as efficient as you can get, but it does require a lot of manual processing.
答案2
得分: 0
假设JSON流如下所示:
{"Name": "Ed", "Text": "Knock knock."}{"Name": "Sam", "Text": "Who's there?"}{"Name": "Ed", "Text": "Go fmt."}
我有一个想法,伪代码如下所示:
1:跳过前缀空格
2:如果第一个字符不是{,则抛出错误
3:加载一些字符,并找到第一个"}"
4:如果找到,尝试进行json.Unmarshal()
5:如果解组失败,加载更多字符,并找到第二个"}"
6:重新执行步骤4
英文:
Assume the json stream like:
{"Name": "Ed", "Text": "Knock knock."}{"Name": "Sam", "Text": "Who's there?"}{"Name": "Ed", "Text": "Go fmt."}
I have idea, pseudo code like below:
1: skip prefix whitespace
2: if first char not {, throw error
3: load some chars, and find the first "}"
4: if found, try json.Unmarshal()
5: if unmarshal fail, load more chars, and find second "}"
6: redo STEP 4
答案3
得分: 0
以下是我项目中已经工作的实现:
package json
import (
"bytes"
j "encoding/json"
"errors"
"io"
"strings"
)
// Stream 代表一个 JSON 流
type Stream struct {
stream *bytes.Buffer
object *bytes.Buffer
scrap *bytes.Buffer
}
// NewStream 返回一个基于 src 的 Stream
func NewStream(src []byte) *Stream {
return &Stream{
stream: bytes.NewBuffer(src),
object: new(bytes.Buffer),
scrap: new(bytes.Buffer),
}
}
// Read 读取一个 JSON 对象
func (s *Stream) Read() ([]byte, error) {
var obj []byte
for {
// 从流中读取一个 rune
r, _, err := s.stream.ReadRune()
switch err {
case nil:
case io.EOF:
if strings.TrimSpace(s.object.String()) != "" {
return nil, errors.New("无效的 JSON")
}
fallthrough
default:
return nil, err
}
// 将 rune 写入对象缓冲区
if _, err := s.object.WriteRune(r); err != nil {
return nil, err
}
if r == '}' {
obj = s.object.Bytes()
// 检查 JSON 字符串是否有效
err := j.Compact(s.scrap, obj)
s.scrap.Reset()
if err != nil {
continue
}
s.object.Reset()
break
}
}
return obj, nil
}
使用方法如下:
func process(src []byte) error {
s := json.NewStream(src)
for {
obj, err := s.Read()
switch err {
case nil:
case io.EOF:
return nil
default:
return err
}
// 现在你可以尝试将 obj 解码为结构体/映射等
// 它也支持混合流,例如:
a := new(TypeOne)
b := new(TypeTwo)
if err := j.Unmarshal(obj, a); err == nil && a.Error != "" {
// 这是一个 TypeOne 对象
} else if err := j.Unmarshal(obj, b); err == nil && b.ID != "" {
// 这是一个 TypeTwo 对象
} else {
// 未知类型
}
}
return nil
}
希望对你有帮助!
英文:
Below is an implementation, already working in my project:
package json
import (
"bytes"
j "encoding/json"
"errors"
"io"
"strings"
)
// Stream represent a json stream
type Stream struct {
stream *bytes.Buffer
object *bytes.Buffer
scrap *bytes.Buffer
}
// NewStream return a Stream that based on src
func NewStream(src []byte) *Stream {
return &Stream{
stream: bytes.NewBuffer(src),
object: new(bytes.Buffer),
scrap: new(bytes.Buffer),
}
}
// Read read a json object
func (s *Stream) Read() ([]byte, error) {
var obj []byte
for {
// read a rune from stream
r, _, err := s.stream.ReadRune()
switch err {
case nil:
case io.EOF:
if strings.TrimSpace(s.object.String()) != "" {
return nil, errors.New("Invalid JSON")
}
fallthrough
default:
return nil, err
}
// write the rune to object buffer
if _, err := s.object.WriteRune(r); err != nil {
return nil, err
}
if r == '}' {
obj = s.object.Bytes()
// check whether json string valid
err := j.Compact(s.scrap, obj)
s.scrap.Reset()
if err != nil {
continue
}
s.object.Reset()
break
}
}
return obj, nil
}
Usage like below:
func process(src []byte) error {
s := json.NewStream(src)
for {
obj, err := s.Read()
switch err {
case nil:
case io.EOF:
return nil
default:
return err
}
// now you can try to decode the obj to a struct/map/...
// it is also support mix stream, ex.:
a = new(TypeOne)
b = new(TypeTwo)
if err := j.Unmarshal(obj, a); err == nil && a.Error != "" {
// it is a TypeOne object
} else if err := j.Unmarshal(obj, b); err == nil && a.ID != "" {
// it is a TypeTwo object
} else {
// unkown type
}
}
return nil
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论