处理大型 JSON 并替换特定值的最佳方法是什么?

huangapple go评论96阅读模式
英文:

Which is the best way to process big json & replace specific values?

问题

我有一个大的 JSON 文件(30MB),其中包含不同对象中的“title”字段,JSON 的结构是未知的。

我只知道 JSON 包含键“title”,并且该键的字符串值必须翻译成另一个值。

以下是一个示例:

{
    "data1": {
        "title": "alpha",
        "color": "green"
    },
    "data2": {
        "someInnerData1": {
            "title": "beta",
            "color": "red"
        },
        "someInnerData2": {
            "someArray": [
                {
                    "title": "gamma",
                    "color": "orange"
                },
                {
                    "title": "delta",
                    "color": "purple"
                }
            ],
            "title": "epsilon"
        }
    }
}

将示例中的值替换为:

"alpha" -> "Α"
"beta" -> "B"
等等...

在 Golang 中,不解码为结构体的情况下,最好的方法是什么?

附注: JSON 是从网络接收的。

英文:

I have a big json (30mb) which contains "title" fields in different objects , structure of json is unknown.

Known only that json contains keys "title" and string value of this key must be translated into another.

<!-- language: lang-html -->

{
	&quot;data1&quot; : {
		&quot;title&quot; : &quot;alpha&quot;,
		&quot;color&quot; : &quot;green&quot;
	},
	&quot;data2&quot; : {
		&quot;someInnerData1&quot; : {
			&quot;title&quot; : &quot;beta&quot;
			&quot;color&quot; : &quot;red&quot;
		},
		&quot;someInnerData2&quot; : {
			&quot;someArray&quot; : [
			{
				&quot;title&quot; : &quot;gamme&quot;,
				&quot;color&quot; : &quot;orange&quot;
			},
			{
				&quot;title&quot; : &quot;delta&quot;,
				&quot;color&quot; : &quot;purple&quot;
			}
			],
			&quot;title&quot; : &quot;epsilon&quot;
		}
	}
}

<!-- end snippet -->

Replace example
"alpha" -> "Α"
"beta" -> "B"
etc..

Which the best way achieve that in Golang , without decoding into struct ?

P.S. Json is received from network.

答案1

得分: 0

我会创建一个实现io.Reader接口的结构体,并将该读取器用作翻译的基础:你可以使用它逐块获取JSON输入,并在需要更改的键上进行检测,从而实时进行翻译。

然后,你只需使用io.Copy将整个文件读取到另一个文件中。

请参考text.transform包的依赖图以获取示例...

英文:

I would make a struct that implements the io.Reader interface, and use that reader as a translation ground: you can use it to get you JSON input chunk by chunk, and detect when you are on a key that need to be changed, so translate it on the fly.

Then, you just have to use a io.Copy to read the whole file into another.

See the text.transform package dependency graph for examples…

答案2

得分: 0

你可以使用像megajson这样的流式JSON解码器:

// 将'title'字符串转换为标题格式
func TitleizeJSON(r io.Reader, w io.Writer) error {
	buf := new(bytes.Buffer)
	r = io.TeeReader(r, buf)

	s := scanner.NewScanner(r)
	var prevTok int
	var prevPos int
	wasTitle := false
	titleField := []byte("title")
	for {
		// 读取下一个JSON令牌
		tok, data, err := s.Scan()
		if err == io.EOF {
			return nil
		} else if err != nil {
			return err
		}
		// 计算缓冲区中的位置
		pos := s.Pos()
		off := pos - prevPos

		switch tok {
		// 如果是字符串
		case scanner.TSTRING:
			// 如果前一个字符串在:之前是'title',则将其转换为标题格式
			if prevTok == scanner.TCOLON && wasTitle {
				// 获取缓冲区的第一部分并跳过第一个",然后将其转换为标题格式
				data = buf.Bytes()[:off][1:]
				copy(data, bytes.Title(data))
				wasTitle = false
			} else {
				wasTitle = bytes.Equal(data, titleField)
			}
		}

		// 现在将数据发送到写入器
		data = buf.Bytes()
		_, err = w.Write(data[:off])
		if err != nil {
			return err
		}

		// 重置缓冲区(以防止其无限增长)
		nbuf := make([]byte, len(data)-off)
		copy(nbuf, data[off:])
		buf.Reset()
		buf.Write(nbuf)

		// 为下一次循环做准备
		prevTok = tok
		prevPos = pos
	}
}

这样可以实时进行标题化处理。我能想到的唯一一个可能会出问题的情况是如果你有一个非常非常大的字符串。

英文:

You can use a streaming JSON decoder like megajson:

// Transform &#39;title&#39; strings into Title case
func TitleizeJSON(r io.Reader, w io.Writer) error {
	buf := new(bytes.Buffer)
	r = io.TeeReader(r, buf)

	s := scanner.NewScanner(r)
	var prevTok int
	var prevPos int
	wasTitle := false
	titleField := []byte(&quot;title&quot;)
	for {
		// read the next json token
		tok, data, err := s.Scan()
		if err == io.EOF {
			return nil
		} else if err != nil {
			return err
		}
		// calculate the position in the buffer
		pos := s.Pos()
		off := pos - prevPos

		switch tok {
		// if this is a string
		case scanner.TSTRING:
			// if the previous string before a : was &#39;title&#39;, then
			// titlelize it
			if prevTok == scanner.TCOLON &amp;&amp; wasTitle {
				// grab the first part of the buffer and skip
				// the first &quot;, the titleize the rest
				data = buf.Bytes()[:off][1:]
				copy(data, bytes.Title(data))
				wasTitle = false
			} else {
				wasTitle = bytes.Equal(data, titleField)
			}
		}

		// now send the data to the writer
		data = buf.Bytes()
		_, err = w.Write(data[:off])
		if err != nil {
			return err
		}

		// reset the buffer (so it doesn&#39;t grow forever)
		nbuf := make([]byte, len(data)-off)
		copy(nbuf, data[off:])
		buf.Reset()
		buf.Write(nbuf)

		// for the next go-around
		prevTok = tok
		prevPos = pos
	}
}

This should do the titleizing on the fly. The one case I can think of where it will have a problem is if you have a really really big string.

huangapple
  • 本文由 发表于 2015年9月1日 17:28:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/32328030.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定