处理大型 JSON 并替换特定值的最佳方法是什么?

huangapple go评论142阅读模式
英文:

Which is the best way to process big json & replace specific values?

问题

我有一个大的 JSON 文件(30MB),其中包含不同对象中的“title”字段,JSON 的结构是未知的。

我只知道 JSON 包含键“title”,并且该键的字符串值必须翻译成另一个值。

以下是一个示例:

  1. {
  2. "data1": {
  3. "title": "alpha",
  4. "color": "green"
  5. },
  6. "data2": {
  7. "someInnerData1": {
  8. "title": "beta",
  9. "color": "red"
  10. },
  11. "someInnerData2": {
  12. "someArray": [
  13. {
  14. "title": "gamma",
  15. "color": "orange"
  16. },
  17. {
  18. "title": "delta",
  19. "color": "purple"
  20. }
  21. ],
  22. "title": "epsilon"
  23. }
  24. }
  25. }

将示例中的值替换为:

"alpha" -> "Α"
"beta" -> "B"
等等...

在 Golang 中,不解码为结构体的情况下,最好的方法是什么?

附注: JSON 是从网络接收的。

英文:

I have a big json (30mb) which contains "title" fields in different objects , structure of json is unknown.

Known only that json contains keys "title" and string value of this key must be translated into another.

<!-- language: lang-html -->

  1. {
  2. &quot;data1&quot; : {
  3. &quot;title&quot; : &quot;alpha&quot;,
  4. &quot;color&quot; : &quot;green&quot;
  5. },
  6. &quot;data2&quot; : {
  7. &quot;someInnerData1&quot; : {
  8. &quot;title&quot; : &quot;beta&quot;
  9. &quot;color&quot; : &quot;red&quot;
  10. },
  11. &quot;someInnerData2&quot; : {
  12. &quot;someArray&quot; : [
  13. {
  14. &quot;title&quot; : &quot;gamme&quot;,
  15. &quot;color&quot; : &quot;orange&quot;
  16. },
  17. {
  18. &quot;title&quot; : &quot;delta&quot;,
  19. &quot;color&quot; : &quot;purple&quot;
  20. }
  21. ],
  22. &quot;title&quot; : &quot;epsilon&quot;
  23. }
  24. }
  25. }

<!-- end snippet -->

Replace example
"alpha" -> "Α"
"beta" -> "B"
etc..

Which the best way achieve that in Golang , without decoding into struct ?

P.S. Json is received from network.

答案1

得分: 0

我会创建一个实现io.Reader接口的结构体,并将该读取器用作翻译的基础:你可以使用它逐块获取JSON输入,并在需要更改的键上进行检测,从而实时进行翻译。

然后,你只需使用io.Copy将整个文件读取到另一个文件中。

请参考text.transform包的依赖图以获取示例...

英文:

I would make a struct that implements the io.Reader interface, and use that reader as a translation ground: you can use it to get you JSON input chunk by chunk, and detect when you are on a key that need to be changed, so translate it on the fly.

Then, you just have to use a io.Copy to read the whole file into another.

See the text.transform package dependency graph for examples…

答案2

得分: 0

你可以使用像megajson这样的流式JSON解码器:

  1. // 将'title'字符串转换为标题格式
  2. func TitleizeJSON(r io.Reader, w io.Writer) error {
  3. buf := new(bytes.Buffer)
  4. r = io.TeeReader(r, buf)
  5. s := scanner.NewScanner(r)
  6. var prevTok int
  7. var prevPos int
  8. wasTitle := false
  9. titleField := []byte("title")
  10. for {
  11. // 读取下一个JSON令牌
  12. tok, data, err := s.Scan()
  13. if err == io.EOF {
  14. return nil
  15. } else if err != nil {
  16. return err
  17. }
  18. // 计算缓冲区中的位置
  19. pos := s.Pos()
  20. off := pos - prevPos
  21. switch tok {
  22. // 如果是字符串
  23. case scanner.TSTRING:
  24. // 如果前一个字符串在:之前是'title',则将其转换为标题格式
  25. if prevTok == scanner.TCOLON && wasTitle {
  26. // 获取缓冲区的第一部分并跳过第一个",然后将其转换为标题格式
  27. data = buf.Bytes()[:off][1:]
  28. copy(data, bytes.Title(data))
  29. wasTitle = false
  30. } else {
  31. wasTitle = bytes.Equal(data, titleField)
  32. }
  33. }
  34. // 现在将数据发送到写入器
  35. data = buf.Bytes()
  36. _, err = w.Write(data[:off])
  37. if err != nil {
  38. return err
  39. }
  40. // 重置缓冲区(以防止其无限增长)
  41. nbuf := make([]byte, len(data)-off)
  42. copy(nbuf, data[off:])
  43. buf.Reset()
  44. buf.Write(nbuf)
  45. // 为下一次循环做准备
  46. prevTok = tok
  47. prevPos = pos
  48. }
  49. }

这样可以实时进行标题化处理。我能想到的唯一一个可能会出问题的情况是如果你有一个非常非常大的字符串。

英文:

You can use a streaming JSON decoder like megajson:

  1. // Transform &#39;title&#39; strings into Title case
  2. func TitleizeJSON(r io.Reader, w io.Writer) error {
  3. buf := new(bytes.Buffer)
  4. r = io.TeeReader(r, buf)
  5. s := scanner.NewScanner(r)
  6. var prevTok int
  7. var prevPos int
  8. wasTitle := false
  9. titleField := []byte(&quot;title&quot;)
  10. for {
  11. // read the next json token
  12. tok, data, err := s.Scan()
  13. if err == io.EOF {
  14. return nil
  15. } else if err != nil {
  16. return err
  17. }
  18. // calculate the position in the buffer
  19. pos := s.Pos()
  20. off := pos - prevPos
  21. switch tok {
  22. // if this is a string
  23. case scanner.TSTRING:
  24. // if the previous string before a : was &#39;title&#39;, then
  25. // titlelize it
  26. if prevTok == scanner.TCOLON &amp;&amp; wasTitle {
  27. // grab the first part of the buffer and skip
  28. // the first &quot;, the titleize the rest
  29. data = buf.Bytes()[:off][1:]
  30. copy(data, bytes.Title(data))
  31. wasTitle = false
  32. } else {
  33. wasTitle = bytes.Equal(data, titleField)
  34. }
  35. }
  36. // now send the data to the writer
  37. data = buf.Bytes()
  38. _, err = w.Write(data[:off])
  39. if err != nil {
  40. return err
  41. }
  42. // reset the buffer (so it doesn&#39;t grow forever)
  43. nbuf := make([]byte, len(data)-off)
  44. copy(nbuf, data[off:])
  45. buf.Reset()
  46. buf.Write(nbuf)
  47. // for the next go-around
  48. prevTok = tok
  49. prevPos = pos
  50. }
  51. }

This should do the titleizing on the fly. The one case I can think of where it will have a problem is if you have a really really big string.

huangapple
  • 本文由 发表于 2015年9月1日 17:28:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/32328030.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定