2015年3月26日 23:16:47go评论162阅读模式

英文:

Go JSON decoding is very slow. What would be a better way to do it?

问题

我正在使用Go语言、Revel WAF和Redis。

我需要在Redis中存储大型JSON数据（可能达到20MB）。

json.Unmarshal()需要大约5秒的时间。有没有更好的方法？

我尝试过JsonLib、encode/json、ffjson、megajson，但它们都不够快。

我考虑过使用groupcache，但JSON数据需要实时更新。

以下是示例代码：

package main

import (
    "github.com/garyburd/redigo/redis"
    json "github.com/pquerna/ffjson/ffjson"
)

func main() {
    c, err := redis.Dial("tcp", ":6379")
    defer c.Close()
    pointTable, err := redis.String(c.Do("GET", "data"))
    var hashPoint map[string][]float64
    json.Unmarshal([]byte(pointTable), &hashPoint) //问题所在！！！
}

请注意，这是我翻译的代码部分，不包括你的问题。

英文:

I am using Go, Revel WAF and Redis.

I have to store large json data in Redis (maybe 20MB).

json.Unmarshal() takes about roughly 5 seconds. What would be a better way to do it?

I tried JsonLib, encode/json, ffjson, megajson, but none of them were fast enough.

I thought about using groupcache, but Json is updated in real time.

This is the sample code:

package main

import (
 &quot;github.com/garyburd/redigo/redis&quot;
  json &quot;github.com/pquerna/ffjson/ffjson&quot;
)

func main() {
  c, err := redis.Dial(&quot;tcp&quot;, &quot;:6379&quot;)
  defer c.Close()
  pointTable, err := redis.String(c.Do(&quot;GET&quot;, &quot;data&quot;))
  var hashPoint map[string][]float64
  json.Unmarshal([]byte(pointTable), &amp;hashPoint) //Problem!!!
}

答案1

得分: 9

解析大型JSON数据似乎比应该的速度慢。值得找出原因并向Go的作者提交一个补丁。

与此同时，如果你可以避免使用JSON，而使用二进制格式，不仅可以避免这个问题，还可以节省代码现在花在将ASCII十进制表示的数字解析为它们的二进制IEEE 754等价物（可能引入舍入误差）的时间。

如果发送方和接收方都是用Go编写的，我建议使用Go的二进制格式：gob。

进行了一个快速测试，生成了一个包含2000个条目的映射，每个条目都是一个包含1050个简单浮点数的切片，得到了20MB的JSON数据，解析时间为1.16秒。

对于这些快速基准测试，我取三次运行中的最佳结果，但我确保只测量实际的解析时间，在Unmarshal调用之前使用t0 := time.Now()，在之后打印time.Now().Sub(t0)。

使用GOB，相同的映射结果为18MB的数据，解析时间为115毫秒：时间缩短了十分之一。

你的结果会根据你实际拥有的浮点数数量而有所不同。如果你的浮点数有很多有效数字，值得使用它们的float64表示，那么20MB的JSON中包含的浮点数远少于我的两百万个浮点数。在这种情况下，JSON和GOB之间的差异将更加明显。

顺便说一句，这证明问题确实出现在JSON解析器中，而不是在要解析的数据量或要创建的内存结构中（因为两个测试都在解析约20MB的数据并重新创建相同的浮点数切片）。将所有浮点数替换为字符串后，JSON的解析时间为1.02秒，证实了从字符串表示转换为二进制浮点数确实需要一定时间（与仅仅移动字节相比），但这并不是主要问题。

如果发送方和解析器不都是Go，或者如果你想进一步提高性能，可以使用自定义的二进制格式，可以使用Protocol Buffers或手动使用"encoding/binary"和相关工具。

英文:

Parsing large JSON data does seem to be slower than it should be. It would be worthwhile to pinpoint the cause and submit a patch to the Go authors.

In the meantime, if you can avoid JSON and use a binary format, you will not only avoid this issue; you will also gain the time your code is now spending parsing ASCII decimal representations of numbers into their binary IEEE 754 equivalents (and possibly introducing rounding errors while doing so.)

If both your sender and receiver are written in Go, I suggest using Go's binary format: gob.

Doing a quick test, generating a map with 2000 entries, each a slice with 1050 simple floats, gives me 20 MB of JSON, which takes 1.16 sec to parse on my machine.

For these quick benchmarks, I take the best of three runs, but I make sure to only measure the actual parsing time, with t0 := time.Now() before the Unmarshal call and printing time.Now().Sub(t0) after it.

Using GOB, the same map results in 18 MB of data, which takes 115 ms to parse:
one tenth the time.

Your results will vary depending on how many actual floats you have there. If yours floats have a lot of significant digits, deserving their float64 representation, then 20 MB of JSON will contain much fewer than my two million floats. In that case the difference between JSON and GOB will be ever starker.

BTW, this proves that the problem lies indeed in the JSON parser, not in the amount of data to parse, nor in the memory structures to create (because both tests are parsing ~ 20 MB of data and recreating the same slices of floats.) Replacing all the floats with strings in the JSON gives me a parsing time of 1.02 sec, confirming that the conversion from string representation to binary floats does takes a certain time (compared to just moving bytes around) but is not the main culprit.

If the sender and the parser are not both Go, or if you want to squeeze the performance even further than GOB, you should use your own customised binary format, either using Protocol Buffers or manually with "encoding/binary" and friends.

答案2

得分: 1

尝试使用https://github.com/json-iterator/go。

与官方版本相比，我获得了2倍的解码速度提升，更重要的是jsoniter的API与encoding/json兼容。

英文:

Try https://github.com/json-iterator/go

I got 2x decoding speed upgrade comparing to official one, more benefit is jsoniter's APIs are compatible with encoding/json.

答案3

得分: 0

你可以尝试使用fastjson。它经过优化以提高速度，通常比标准的encoding/json库解析JSON更快。此外，fastjson不需要使用符合JSON模式的结构体 - 一个解析器可以解析多个具有不同模式的JSON。

英文:

Try fastjson. It is optimized for speed and usually parses JSON much faster comparing to standard encoding/json. Additionally, fastjson doesn't need structs adhering JSON schema - a single parser may parse multiple JSONs with distinct schemas.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Go的JSON解码非常慢。有没有更好的方法来解码JSON？

问题

答案1

答案2

答案3

点运算符在结构体和指向结构体的指针上的工作方式有何不同？

如何在Go中自定义JSON编码输出？

Go中的多维切片

Go – Idiomatic way around interface slice issue

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论