高效地计算文件中 JSON 对象的数量。

huangapple go评论82阅读模式
英文:

Efficiently count the number of JSON objects in a file

问题

我需要获取给定文件中的 JSON 对象数量。该文件包含一个 JSON 对象数组。我观察到,对于包含 100 万个对象的文件,计数大约需要 150-180 秒。有没有办法优化下面的代码以更快地获取计数?

func Count(file string) (int, error) {
	f, err := os.Open(file)
	if err != nil {
		return -1, err
	}
	defer f.Close()
	dec := json.NewDecoder(bufio.NewReader(f))
	_, e := dec.Token()
	if e != nil {
		return -1, e
	}
	var count int
	for dec.More() {
		var tempMap map[string]interface{}
		readErr := dec.Decode(&tempMap)
		if readErr != nil {
			return -1, readErr
		}
		count++
	}
	return count, nil
}
英文:

I need to get the number of json objects in a given file. The File contains an array of JSON objects. I observe that its taking approximately 150-180 seconds to count a file with 1 million objects. Is there a way I can optimize the below code to get the count faster?

func Count(file string) (int, error) {
	f, err := os.Open(file)
	if err != nil {
		return -1, err
	}
	defer f.Close()
	dec := json.NewDecoder(bufio.NewReader(f))
	_, e := dec.Token()
	if e != nil {
		return -1, e
	}
	var count int
	for dec.More() {
		var tempMap map[string]interface{}
		readErr := dec.Decode(&tempMap)
		if readErr != nil {
			return -1, readErr
		}
		tranCount++
	}
	return count, nil
}

答案1

得分: 1

通过计算起始对象分隔符的数量来加快速度,而不是解码为Go值。

根据问题中的代码,看起来你的目标是计算文档中第一层嵌套的对象数量。以下是实现该目标的代码:

func Count(r io.Reader) (int, error) {
    dec := json.NewDecoder(r)

    nest := 0
    count := 0

    for {
        t, err := dec.Token()
        if err == io.EOF {
            break
        }
        if err != nil {
            return -1, err
        }
        switch t {
        case json.Delim('{'):
            if nest == 1 {
                count++
            }
            nest++
        case json.Delim('}'):
            nest--
        }
    }
    return count, nil
}

如果你的目标是计算所有对象的数量,可以从上述代码中删除所有对nest的使用:

func Count(r io.Reader) (int, error) {
    dec := json.NewDecoder(r)

    count := 0

    for {
        t, err := dec.Token()
        if err == io.EOF {
            break
        }
        if err != nil {
            return -1, err
        }
        switch t {
        case json.Delim('{'):
            count++
        }
    }
    return count, nil
}

以上是翻译好的内容,请确认是否满意。

英文:

Speed things up by counting start object delimiters instead of decoding to Go values.

Based on the code in the question, it looks like your goal is to count objects at the first level of nesting in the document. Here's code that does that:

func Count(r io.Reader) (int, error) {
	dec := json.NewDecoder(r)

	nest := 0
	count := 0

	for {
		t, err := dec.Token()
		if err == io.EOF {
			break
		}
		if err != nil {
			return -1, err
		}
		switch t {
		case json.Delim('{'):
			if nest == 1 {
				count++
			}
			nest++
		case json.Delim('}'):
			nest--
		}
	}
	return count, nil
}

If your goal is to count all objects, remove all uses of nest from the code above:

func Count(r io.Reader) (int, error) {
	dec := json.NewDecoder(r)

	count := 0

	for {
		t, err := dec.Token()
		if err == io.EOF {
			break
		}
		if err != nil {
			return -1, err
		}
		switch t {
		case json.Delim('{'):
			count++
		}
	}
	return count, nil
}

huangapple
  • 本文由 发表于 2021年8月25日 13:42:50
  • 转载请务必保留本文链接:https://go.coder-hub.com/68917401.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定