解码大型流式JSON数据

huangapple go评论130阅读模式
英文:

Decode large stream JSON

问题

我有一个存储在文件中的大型JSON数组("file.json")。
我需要遍历数组并对每个元素执行一些操作。

  1. err = json.Unmarshal(dat, &all_data)

这段代码会导致内存溢出,我猜测是因为它首先将所有内容加载到内存中。

有没有一种逐个流式传输JSON元素的方法?

英文:

I have a massive JSON array stored in a file ("file.json")
I need to iterate through the array and do some operation on each element.

  1. err = json.Unmarshal(dat, &all_data)

Causes an out of memory - I'm guessing because it loads everything into memory first.

Is there a way to stream the JSON element by element?

答案1

得分: 42

这是一个类似的例子,可以在encoding/json文档中找到:

  1. package main
  2. import (
  3. "encoding/json"
  4. "fmt"
  5. "log"
  6. "strings"
  7. )
  8. func main() {
  9. const jsonStream = `
  10. [
  11. {"Name": "Ed", "Text": "Knock knock."},
  12. {"Name": "Sam", "Text": "Who's there?"},
  13. {"Name": "Ed", "Text": "Go fmt."},
  14. {"Name": "Sam", "Text": "Go fmt who?"},
  15. {"Name": "Ed", "Text": "Go fmt yourself!"}
  16. ]
  17. `
  18. type Message struct {
  19. Name, Text string
  20. }
  21. dec := json.NewDecoder(strings.NewReader(jsonStream))
  22. // 读取开括号
  23. t, err := dec.Token()
  24. if err != nil {
  25. log.Fatal(err)
  26. }
  27. fmt.Printf("%T: %v\n", t, t)
  28. // 当数组包含值时
  29. for dec.More() {
  30. var m Message
  31. // 解码数组值(Message)
  32. err := dec.Decode(&m)
  33. if err != nil {
  34. log.Fatal(err)
  35. }
  36. fmt.Printf("%v: %v\n", m.Name, m.Text)
  37. }
  38. // 读取闭括号
  39. t, err = dec.Token()
  40. if err != nil {
  41. log.Fatal(err)
  42. }
  43. fmt.Printf("%T: %v\n", t, t)
  44. }

希望对你有帮助!

英文:

There is an example of this sort of thing in encoding/json documentation:

  1. package main
  2. import (
  3. "encoding/json"
  4. "fmt"
  5. "log"
  6. "strings"
  7. )
  8. func main() {
  9. const jsonStream = `
  10. [
  11. {"Name": "Ed", "Text": "Knock knock."},
  12. {"Name": "Sam", "Text": "Who's there?"},
  13. {"Name": "Ed", "Text": "Go fmt."},
  14. {"Name": "Sam", "Text": "Go fmt who?"},
  15. {"Name": "Ed", "Text": "Go fmt yourself!"}
  16. ]
  17. `
  18. type Message struct {
  19. Name, Text string
  20. }
  21. dec := json.NewDecoder(strings.NewReader(jsonStream))
  22. // read open bracket
  23. t, err := dec.Token()
  24. if err != nil {
  25. log.Fatal(err)
  26. }
  27. fmt.Printf("%T: %v\n", t, t)
  28. // while the array contains values
  29. for dec.More() {
  30. var m Message
  31. // decode an array value (Message)
  32. err := dec.Decode(&m)
  33. if err != nil {
  34. log.Fatal(err)
  35. }
  36. fmt.Printf("%v: %v\n", m.Name, m.Text)
  37. }
  38. // read closing bracket
  39. t, err = dec.Token()
  40. if err != nil {
  41. log.Fatal(err)
  42. }
  43. fmt.Printf("%T: %v\n", t, t)
  44. }

答案2

得分: 4

所以,正如评论者建议的那样,你可以使用"encoding/json"的流式API逐个读取字符串:

  1. r := ... // 获取某个io.Reader(例如打开大数组文件)
  2. d := json.NewDecoder(r)
  3. // 读取 "["
  4. d.Token()
  5. // 逐个读取字符串
  6. for d.More() {
  7. s, _ := d.Token()
  8. // 对新读取的字符串 s 进行处理
  9. fmt.Printf("读取 %q\n", s)
  10. }
  11. // (可选)读取 "]"
  12. d.Token()

请注意,为了简单起见,我省略了错误处理,你需要自行实现。

英文:

So, as commenters suggested, you could use the streaming API of "encoding/json" for reading one string at a time:

  1. r := ... // get some io.Reader (e.g. open the big array file)
  2. d := json.NewDecoder(r)
  3. // read "["
  4. d.Token()
  5. // read strings one by one
  6. for d.More() {
  7. s, _ := d.Token()
  8. // do something with s which is the newly read string
  9. fmt.Printf("read %q\n", s)
  10. }
  11. // (optionally) read "]"
  12. d.Token()

Note that for simplicity I've left error handling out which needs to be implemented.

huangapple
  • 本文由 发表于 2015年8月4日 02:32:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/31794355.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定