英文:
Which is the best way to process big json & replace specific values?
问题
我有一个大的 JSON 文件(30MB),其中包含不同对象中的“title”字段,JSON 的结构是未知的。
我只知道 JSON 包含键“title”,并且该键的字符串值必须翻译成另一个值。
以下是一个示例:
{
"data1": {
"title": "alpha",
"color": "green"
},
"data2": {
"someInnerData1": {
"title": "beta",
"color": "red"
},
"someInnerData2": {
"someArray": [
{
"title": "gamma",
"color": "orange"
},
{
"title": "delta",
"color": "purple"
}
],
"title": "epsilon"
}
}
}
将示例中的值替换为:
"alpha" -> "Α"
"beta" -> "B"
等等...
在 Golang 中,不解码为结构体的情况下,最好的方法是什么?
附注: JSON 是从网络接收的。
英文:
I have a big json (30mb) which contains "title" fields in different objects , structure of json is unknown.
Known only that json contains keys "title" and string value of this key must be translated into another.
<!-- language: lang-html -->
{
"data1" : {
"title" : "alpha",
"color" : "green"
},
"data2" : {
"someInnerData1" : {
"title" : "beta"
"color" : "red"
},
"someInnerData2" : {
"someArray" : [
{
"title" : "gamme",
"color" : "orange"
},
{
"title" : "delta",
"color" : "purple"
}
],
"title" : "epsilon"
}
}
}
<!-- end snippet -->
Replace example
"alpha" -> "Α"
"beta" -> "B"
etc..
Which the best way achieve that in Golang , without decoding into struct ?
P.S. Json is received from network.
答案1
得分: 0
我会创建一个实现io.Reader
接口的结构体,并将该读取器用作翻译的基础:你可以使用它逐块获取JSON输入,并在需要更改的键上进行检测,从而实时进行翻译。
然后,你只需使用io.Copy
将整个文件读取到另一个文件中。
请参考text.transform包的依赖图以获取示例...
英文:
I would make a struct that implements the io.Reader
interface, and use that reader as a translation ground: you can use it to get you JSON input chunk by chunk, and detect when you are on a key that need to be changed, so translate it on the fly.
Then, you just have to use a io.Copy
to read the whole file into another.
See the text.transform package dependency graph for examples…
答案2
得分: 0
你可以使用像megajson这样的流式JSON解码器:
// 将'title'字符串转换为标题格式
func TitleizeJSON(r io.Reader, w io.Writer) error {
buf := new(bytes.Buffer)
r = io.TeeReader(r, buf)
s := scanner.NewScanner(r)
var prevTok int
var prevPos int
wasTitle := false
titleField := []byte("title")
for {
// 读取下一个JSON令牌
tok, data, err := s.Scan()
if err == io.EOF {
return nil
} else if err != nil {
return err
}
// 计算缓冲区中的位置
pos := s.Pos()
off := pos - prevPos
switch tok {
// 如果是字符串
case scanner.TSTRING:
// 如果前一个字符串在:之前是'title',则将其转换为标题格式
if prevTok == scanner.TCOLON && wasTitle {
// 获取缓冲区的第一部分并跳过第一个",然后将其转换为标题格式
data = buf.Bytes()[:off][1:]
copy(data, bytes.Title(data))
wasTitle = false
} else {
wasTitle = bytes.Equal(data, titleField)
}
}
// 现在将数据发送到写入器
data = buf.Bytes()
_, err = w.Write(data[:off])
if err != nil {
return err
}
// 重置缓冲区(以防止其无限增长)
nbuf := make([]byte, len(data)-off)
copy(nbuf, data[off:])
buf.Reset()
buf.Write(nbuf)
// 为下一次循环做准备
prevTok = tok
prevPos = pos
}
}
这样可以实时进行标题化处理。我能想到的唯一一个可能会出问题的情况是如果你有一个非常非常大的字符串。
英文:
You can use a streaming JSON decoder like megajson:
// Transform 'title' strings into Title case
func TitleizeJSON(r io.Reader, w io.Writer) error {
buf := new(bytes.Buffer)
r = io.TeeReader(r, buf)
s := scanner.NewScanner(r)
var prevTok int
var prevPos int
wasTitle := false
titleField := []byte("title")
for {
// read the next json token
tok, data, err := s.Scan()
if err == io.EOF {
return nil
} else if err != nil {
return err
}
// calculate the position in the buffer
pos := s.Pos()
off := pos - prevPos
switch tok {
// if this is a string
case scanner.TSTRING:
// if the previous string before a : was 'title', then
// titlelize it
if prevTok == scanner.TCOLON && wasTitle {
// grab the first part of the buffer and skip
// the first ", the titleize the rest
data = buf.Bytes()[:off][1:]
copy(data, bytes.Title(data))
wasTitle = false
} else {
wasTitle = bytes.Equal(data, titleField)
}
}
// now send the data to the writer
data = buf.Bytes()
_, err = w.Write(data[:off])
if err != nil {
return err
}
// reset the buffer (so it doesn't grow forever)
nbuf := make([]byte, len(data)-off)
copy(nbuf, data[off:])
buf.Reset()
buf.Write(nbuf)
// for the next go-around
prevTok = tok
prevPos = pos
}
}
This should do the titleizing on the fly. The one case I can think of where it will have a problem is if you have a really really big string.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论