英文:
Decode large stream JSON
问题
我有一个存储在文件中的大型JSON数组("file.json")。
我需要遍历数组并对每个元素执行一些操作。
err = json.Unmarshal(dat, &all_data)
这段代码会导致内存溢出,我猜测是因为它首先将所有内容加载到内存中。
有没有一种逐个流式传输JSON元素的方法?
英文:
I have a massive JSON array stored in a file ("file.json")
I need to iterate through the array and do some operation on each element.
err = json.Unmarshal(dat, &all_data)
Causes an out of memory - I'm guessing because it loads everything into memory first.
Is there a way to stream the JSON element by element?
答案1
得分: 42
这是一个类似的例子,可以在encoding/json
文档中找到:
package main
import (
"encoding/json"
"fmt"
"log"
"strings"
)
func main() {
const jsonStream = `
[
{"Name": "Ed", "Text": "Knock knock."},
{"Name": "Sam", "Text": "Who's there?"},
{"Name": "Ed", "Text": "Go fmt."},
{"Name": "Sam", "Text": "Go fmt who?"},
{"Name": "Ed", "Text": "Go fmt yourself!"}
]
`
type Message struct {
Name, Text string
}
dec := json.NewDecoder(strings.NewReader(jsonStream))
// 读取开括号
t, err := dec.Token()
if err != nil {
log.Fatal(err)
}
fmt.Printf("%T: %v\n", t, t)
// 当数组包含值时
for dec.More() {
var m Message
// 解码数组值(Message)
err := dec.Decode(&m)
if err != nil {
log.Fatal(err)
}
fmt.Printf("%v: %v\n", m.Name, m.Text)
}
// 读取闭括号
t, err = dec.Token()
if err != nil {
log.Fatal(err)
}
fmt.Printf("%T: %v\n", t, t)
}
希望对你有帮助!
英文:
There is an example of this sort of thing in encoding/json
documentation:
package main
import (
"encoding/json"
"fmt"
"log"
"strings"
)
func main() {
const jsonStream = `
[
{"Name": "Ed", "Text": "Knock knock."},
{"Name": "Sam", "Text": "Who's there?"},
{"Name": "Ed", "Text": "Go fmt."},
{"Name": "Sam", "Text": "Go fmt who?"},
{"Name": "Ed", "Text": "Go fmt yourself!"}
]
`
type Message struct {
Name, Text string
}
dec := json.NewDecoder(strings.NewReader(jsonStream))
// read open bracket
t, err := dec.Token()
if err != nil {
log.Fatal(err)
}
fmt.Printf("%T: %v\n", t, t)
// while the array contains values
for dec.More() {
var m Message
// decode an array value (Message)
err := dec.Decode(&m)
if err != nil {
log.Fatal(err)
}
fmt.Printf("%v: %v\n", m.Name, m.Text)
}
// read closing bracket
t, err = dec.Token()
if err != nil {
log.Fatal(err)
}
fmt.Printf("%T: %v\n", t, t)
}
答案2
得分: 4
所以,正如评论者建议的那样,你可以使用"encoding/json"的流式API逐个读取字符串:
r := ... // 获取某个io.Reader(例如打开大数组文件)
d := json.NewDecoder(r)
// 读取 "["
d.Token()
// 逐个读取字符串
for d.More() {
s, _ := d.Token()
// 对新读取的字符串 s 进行处理
fmt.Printf("读取 %q\n", s)
}
// (可选)读取 "]"
d.Token()
请注意,为了简单起见,我省略了错误处理,你需要自行实现。
英文:
So, as commenters suggested, you could use the streaming API of "encoding/json" for reading one string at a time:
r := ... // get some io.Reader (e.g. open the big array file)
d := json.NewDecoder(r)
// read "["
d.Token()
// read strings one by one
for d.More() {
s, _ := d.Token()
// do something with s which is the newly read string
fmt.Printf("read %q\n", s)
}
// (optionally) read "]"
d.Token()
Note that for simplicity I've left error handling out which needs to be implemented.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论