英文:
Speeding up JSON parsing in Go
问题
我们有一些事务日志文件,其中每个事务都是以JSON格式的单行表示。我们经常需要提取数据的选定部分,执行一次时间转换,并将结果以特定格式输入到另一个系统中。我编写了一个Python脚本来完成这个任务,但我希望使用Go语言能够更快,并且给我一个学习Go的机会。所以,我写了以下代码:
package main
import "encoding/json"
import "fmt"
import "time"
import "bufio"
import "os"
func main() {
sep := ","
reader := bufio.NewReader(os.Stdin)
for {
data, _ := reader.ReadString('\n')
byt := []byte(data)
var dat map[string]interface{}
if err := json.Unmarshal(byt, &dat); err != nil {
break
}
status := dat["status"].(string)
a_status := dat["a_status"].(string)
method := dat["method"].(string)
path := dat["path"].(string)
element_uid := dat["element_uid"].(string)
time_local := dat["time_local"].(string)
etime, _ := time.Parse("[02/Jan/2006:15:04:05 -0700]", time_local)
fmt.Print(status, sep, a_status, sep, method, sep, path, sep, element_uid, sep, etime.Unix(), "\n")
}
}
这段代码可以正常编译,但我对性能的改进感到惊讶。为了测试,我将200万行日志放入tmpfs(以确保磁盘I/O不会成为限制),并比较了两个版本的脚本。我的测试结果如下:
$ time cat /mnt/ramdisk/logfile | ./stdin_conv > /dev/null
real 0m51.995s
$ time cat /mnt/ramdisk/logfile | ./stdin_conv.py > /dev/null
real 0m52.471s
$ time cat /mnt/ramdisk/logfile > /dev/null
real 0m0.149s
如何提高性能?我已经做了一些初步的努力。例如,ffjson项目建议创建静态函数,以避免使用反射;然而,到目前为止,我未能使其正常工作,出现了错误:
Error: Go Run Failed for: /tmp/ffjson-inception810284909.go
STDOUT:
STDERR:
/tmp/ffjson-inception810284909.go:9:2: import "json_parse" is a program, not an importable package
:
此外,我上面的代码不算是静态类型吗?也许不是,我对Go语言的了解还很浅薄。我尝试选择性地禁用Go代码中的不同属性,以查看是否有特别问题。但是,这些尝试对性能没有明显影响。有关提高性能的建议吗?或者这只是编译语言在其他语言上没有实质性优势的情况?
英文:
We have transaction log files in which each transaction is a single line in JSON format. We often need to take selected parts of the data, perform a single time conversion, and feed results into another system in a specific format. I wrote a Python script that does this as we need, but I hoped that Go would be faster, and would give me a chance to start learning Go. So, I wrote the following:
package main
import "encoding/json"
import "fmt"
import "time"
import "bufio"
import "os"
func main() {
sep := ","
reader := bufio.NewReader(os.Stdin)
for {
data, _ := reader.ReadString('\n')
byt := []byte(data)
var dat map[string]interface{}
if err := json.Unmarshal(byt, &dat); err != nil {
break
}
status := dat["status"].(string)
a_status := dat["a_status"].(string)
method := dat["method"].(string)
path := dat["path"].(string)
element_uid := dat["element_uid"].(string)
time_local := dat["time_local"].(string)
etime, _ := time.Parse("[02/Jan/2006:15:04:05 -0700]", time_local)
fmt.Print(status, sep, a_status, sep, method, sep, path, sep, element_uid, sep, etime.Unix(), "\n")
}
}
That compiles without complaint, but I'm surprised at the lack of performance improvement. To test, I placed 2,000,000 lines of logs into a tmpfs (to ensure that disk I/O would not be a limitation) and compared the two versions of the script. My results:
$ time cat /mnt/ramdisk/logfile | ./stdin_conv > /dev/null
real 0m51.995s
$ time cat /mnt/ramdisk/logfile | ./stdin_conv.py > /dev/null
real 0m52.471s
$ time cat /mnt/ramdisk/logfile > /dev/null
real 0m0.149s
How can this be made faster? I have made some rudimentary efforts. The ffjson project, for example, proposes to create static functions that make reflection unnecessary; however, I have failed so far to get it to work, getting the error:
Error: Go Run Failed for: /tmp/ffjson-inception810284909.go
STDOUT:
STDERR:
/tmp/ffjson-inception810284909.go:9:2: import "json_parse" is a program, not an importable package
:
Besides, wouldn't what I have above be considered statically typed? Possibly not-- I am positively dripping behind the ears where Go is concerned. I have tried selectively disabling different attributes in the Go code to see if one is especially problematic. None have had an appreciable effect on performance. Any suggestions on improving performance, or is this simply a case where compiled languages have no substantial benefit over others?
答案1
得分: 5
尝试使用类型来消除所有这些不必要的赋值和类型断言;
type RenameMe struct {
Status string `json:"status"`
Astatus string `json:"a_status"`
Method string `json:"method"`
Path string `json:"path"`
ElementUid string `json:"element_uid"`
TimeLocal time.Time `json:"time_local"`
Etime time.Time // 在之后处理这个字段
}
data := &RenameMe{}
if err := json.Unmarshal(byt, data); err != nil {
break
}
data.Etime, _ = time.Parse("[02/Jan/2006:15:04:05 -0700]", time_local)
我不会测试这段代码以确保它的性能超过你的代码,但我敢打赌它确实会有很大的提升。你可以试一试,然后告诉我结果。
英文:
Try using a type to remove all this unnecessary assignment and type assertion;
type RenameMe struct {
Status string `json:"status"`
Astatus string `json:"a_status"`
Method string `json:"method"`
Path string `json:"path"`
ElementUid string `json:"element_uid"`
TimeLocal time.Time `json:"time_local"`
Etime time.Time // deal with this after the fact
}
data := &RenameMe{}
if err := json.Unmarshal(byt, data); err != nil {
break
}
data.Etime, _ := time.Parse("[02/Jan/2006:15:04:05 -0700]", time_local)
I'm not going to test this to ensure it outperforms your code but I bet it does by a large margin. Give it a try and let me know please.
答案2
得分: 1
http://jsoniter.com/ 声称自己是最快的 JSON 解析器,提供了 Golang 和 Java 的实现。可以使用两种类型的 API。预注入的 JSON 对象定义是可选的。
英文:
http://jsoniter.com/ declares itself to be the fastest json parser, golang and java implementations are provided. Two types of api can be used. And pre-injected json object definition is optional.
答案3
得分: 0
请检查 https://github.com/pquerna/ffjson
我看到相比于标准库使用的标准JSON编组/解组方法,它提供了3倍的性能改进。它通过重写源代码并消除反射的需求来实现这一点。
英文:
Check https://github.com/pquerna/ffjson
I saw 3x improvements over the standard json marshal/unmarshal method employed by the standard lib. It does so by rewrite the source and removing the need for reflection.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论