Go vs JavaScript JSON解析

huangapple go评论93阅读模式
英文:

Go vs JavaScript JSON parsing

问题

最近,我需要解析Chrome网页浏览器在其开发工具中记录事件时生成的JSON,并从中获取一些时间数据。Chrome可以在很短的时间内生成大量的数据,所以我最初构建的Ruby解析器非常慢。

由于我正在学习Go语言,所以我决定用Go和JavaScript/Node编写脚本并进行比较。

JSON文件的最简形式可以在this Gist中找到。它包含一个表示发送到获取页面的请求的事件,以及表示响应的事件。通常,还有大量的额外数据需要筛选。这是一个独立的问题,但不是我在这个问题中担心的。

我编写的JavaScript脚本在这里,我编写的Go程序在这里。这是我用Go语言编写的第一个有用的东西,所以我确定它有很多问题。然而,我注意到它在解析大型JSON文件时比JavaScript要慢得多。

使用一个119Mb的JSON文件在Go中的时间:

$ time ./parse data.json
= 22个请求
  最小时间:0.77
  最大时间:0.77
  平均时间:0.77
./gm data.json  4.54s 用户 0.16s 系统 99% cpu 4.705 总时间

使用一个119Mb的JSON文件在JavaScript/Node中的时间:

$ time node parse.js data.json
= 22个请求
  最小时间:0.77
  最大时间:0.77
  平均时间:0.77
node jm.js data.json  1.73s 用户 0.24s 系统 100% cpu 1.959 总时间

(在这个例子中,最小/最大/平均时间都是相同的,因为我复制了JSON对象,以便有一个非常大的数据集,但这是无关紧要的。)

我很好奇是不是JavaScript/Node在解析JSON时更快(这可能并不令人惊讶),还是我在Go程序中做错了什么。我也很好奇我在Go程序中做错了什么,因为我确定其中有很多问题。

请注意,虽然这两个脚本做的事情不仅仅是解析,但肯定是Go中的json.Unmarshal()在程序中增加了很多时间。

更新

我添加了一个Ruby脚本

$ ruby parse.rb
= 22个请求
  最小时间:0.77
  最大时间:0.77
  平均时间:0.77
ruby parse.rb  4.82s 用户 0.82s 系统 99% cpu 5.658 总时间
英文:

Recently, I needed to parse the JSON that the Chrome web browser produces when you record events in its dev tools, and get some timing data out of it. Chrome can produce a pretty large amount of data in a small amount of time, so the Ruby parser I originally built was quite slow.

Since I'm learning Go, I decided to write scripts in both Go and JavaScript/Node and compare them.

The simplest possible form of the JSON file is what I have in this Gist. It contains an event representing the request sent to fetch a page, and the event representing the response. Typically, there's a huge amount of extra data to sift through. That's its own problem, but not what I'm worried about in this question.

The JavaScript script that I wrote is here, and the Go program I wrote is here. This is the first useful thing I've written in Go, so I'm sure it's all sorts of bad. However, one thing I noticed is that it's much slower than JavaScript at parsing a large JSON file.

Time with a 119Mb JSON file in Go:

$ time ./parse data.json
= 22 Requests
  Min Time:      0.77
  Max Time:      0.77
  Average Time:  0.77
./gm data.json  4.54s user 0.16s system 99% cpu 4.705 total

Time with a 119Mb JSON file in JavaScript/Node:

$ time node parse.js data.json
= 22 Requests
  Min Time: 0.77
  Max Time: 0.77
  Avg Time: 0.77
node jm.js data.json  1.73s user 0.24s system 100% cpu 1.959 total

(The min/max/average times are all identical in this example because I duplicated JSON objects so as to have a very large data set, but that's irrelevant.)

I'm curious if it's just that JavaScript/Node is just way faster at parsing JSON (which wouldn't be particularly surprising, I guess), or if there's something I'm doing totally wrong in the Go program. I'm also just curious what I'm doing wrong in the Go program in general, because I'm sure there's plenty wrong with it.

Note that while these two scripts do more than parsing, it's definitely json.Unmarshal() in Go that is adding lots of time in the program.

Update

I added a Ruby script:

$ ruby parse.rb
= 22 Requests
  Min Time: 0.77
  Max Time: 0.77
  Avg Time: 0.77
ruby parse.rb  4.82s user 0.82s system 99% cpu 5.658 total

答案1

得分: 10

使用Go语言,你将JSON解析为静态类型的结构体。而使用JS和Ruby,你将其解析为哈希表。

为了将JSON解析为你定义的结构体,json包需要找到它们字段的名称和类型。为了做到这一点,它使用了reflect包,这比直接访问这些字段要慢得多。

根据你在解析后对数据的处理方式,额外的解析时间可能会得到回报。Go的数据结构比哈希表使用更少的内存,并且访问速度更快。因此,如果你对数据进行了大量操作,处理时间的节省可能会超过额外的解析时间。

英文:

With Go, you are parsing the JSON into statically-typed structures. With JS and Ruby, you are parsing it into hash tables.

In order to parse JSON into the structures that you defined, the json package needs to find out the names and types of their fields. To do this, it uses the reflect package, which is much slower than accessing those fields directly.

Depending on what you do with the data after you parse it, the extra parsing time may pay for itself. The Go data structures use less memory than hash tables, and they are much faster to access. So if you do a lot with the data, the savings on processing time may outweigh the extra parsing time.

huangapple
  • 本文由 发表于 2013年7月12日 03:33:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/17601857.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定