解码大型流式JSON数据

huangapple go评论103阅读模式
英文:

Decode large stream JSON

问题

我有一个存储在文件中的大型JSON数组("file.json")。
我需要遍历数组并对每个元素执行一些操作。

err = json.Unmarshal(dat, &all_data)

这段代码会导致内存溢出,我猜测是因为它首先将所有内容加载到内存中。

有没有一种逐个流式传输JSON元素的方法?

英文:

I have a massive JSON array stored in a file ("file.json")
I need to iterate through the array and do some operation on each element.

err = json.Unmarshal(dat, &all_data)

Causes an out of memory - I'm guessing because it loads everything into memory first.

Is there a way to stream the JSON element by element?

答案1

得分: 42

这是一个类似的例子,可以在encoding/json文档中找到:

package main

import (
	"encoding/json"
	"fmt"
	"log"
	"strings"
)

func main() {
	const jsonStream = `
		[
			{"Name": "Ed", "Text": "Knock knock."},
			{"Name": "Sam", "Text": "Who's there?"},
			{"Name": "Ed", "Text": "Go fmt."},
			{"Name": "Sam", "Text": "Go fmt who?"},
			{"Name": "Ed", "Text": "Go fmt yourself!"}
		]
	`
	type Message struct {
		Name, Text string
	}
	dec := json.NewDecoder(strings.NewReader(jsonStream))

	// 读取开括号
	t, err := dec.Token()
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("%T: %v\n", t, t)

	// 当数组包含值时
	for dec.More() {
		var m Message
		// 解码数组值(Message)
		err := dec.Decode(&m)
		if err != nil {
			log.Fatal(err)
		}

		fmt.Printf("%v: %v\n", m.Name, m.Text)
	}

	// 读取闭括号
	t, err = dec.Token()
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("%T: %v\n", t, t)

}

希望对你有帮助!

英文:

There is an example of this sort of thing in encoding/json documentation:

package main

import (
	"encoding/json"
	"fmt"
	"log"
	"strings"
)

func main() {
	const jsonStream = `
				[
					{"Name": "Ed", "Text": "Knock knock."},
					{"Name": "Sam", "Text": "Who's there?"},
					{"Name": "Ed", "Text": "Go fmt."},
					{"Name": "Sam", "Text": "Go fmt who?"},
					{"Name": "Ed", "Text": "Go fmt yourself!"}
				]
			`
	type Message struct {
		Name, Text string
	}
	dec := json.NewDecoder(strings.NewReader(jsonStream))

	// read open bracket
	t, err := dec.Token()
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("%T: %v\n", t, t)

	// while the array contains values
	for dec.More() {
		var m Message
		// decode an array value (Message)
		err := dec.Decode(&m)
		if err != nil {
			log.Fatal(err)
		}

		fmt.Printf("%v: %v\n", m.Name, m.Text)
	}

	// read closing bracket
	t, err = dec.Token()
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("%T: %v\n", t, t)

}

答案2

得分: 4

所以,正如评论者建议的那样,你可以使用"encoding/json"的流式API逐个读取字符串:

r := ... // 获取某个io.Reader(例如打开大数组文件)
d := json.NewDecoder(r)
// 读取 "["
d.Token()
// 逐个读取字符串
for d.More() {
    s, _ := d.Token()
    // 对新读取的字符串 s 进行处理
    fmt.Printf("读取 %q\n", s)
}
// (可选)读取 "]"
d.Token()

请注意,为了简单起见,我省略了错误处理,你需要自行实现。

英文:

So, as commenters suggested, you could use the streaming API of "encoding/json" for reading one string at a time:

r := ... // get some io.Reader (e.g. open the big array file)
d := json.NewDecoder(r)
// read "["
d.Token()
// read strings one by one
for d.More() {
    s, _ := d.Token()
    // do something with s which is the newly read string
    fmt.Printf("read %q\n", s)
}
// (optionally) read "]"
d.Token()

Note that for simplicity I've left error handling out which needs to be implemented.

huangapple
  • 本文由 发表于 2015年8月4日 02:32:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/31794355.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定