为什么在使用range时,我会看到某些大小的地图出现减速现象?

huangapple go评论83阅读模式
英文:

Why am I seeing slowdowns for certain size maps when using range?

问题

在我的电脑上,当我处理特定大小的地图时,我看到每秒读取速度下降,但它并不是线性下降的。实际上,性能立即下降,然后随着大小的增加慢慢恢复:

$ go run map.go 425984 1 425985
    273578 wps ::   18488800 rps
    227909 wps ::    1790311 rps

$ go run map.go 400000 10000 500000
    271355 wps ::   18060069 rps
    254804 wps ::   18404288 rps
    267067 wps ::   18673778 rps
    216442 wps ::    1984859 rps
    246724 wps ::    2461281 rps
    282316 wps ::    3634125 rps
    216615 wps ::    4989007 rps
    276769 wps ::    6972233 rps
    212019 wps ::    9756720 rps
    286027 wps ::   14488593 rps
    227073 wps ::   17309822 rps

我预期写入偶尔会变慢(因为底层数据结构被重新调整大小),但读取对大小敏感(并且相差一个数量级)让我感到惊讶。

这是我用来测试的代码:

package main

import (
	"bytes"
	"fmt"
	"math/rand"
	"os"
	"strconv"
	"time"
)

func main() {
	start, _ := strconv.ParseInt(os.Args[1], 10, 64)
	step, _ := strconv.ParseInt(os.Args[2], 10, 64)
	end, _ := strconv.ParseInt(os.Args[3], 10, 64)
	for n := start; n <= end; n += step {
		runNTimes(n)
	}
}

func randomString() string {
	var b bytes.Buffer

	for i := 0; i < 16; i++ {
		b.WriteByte(byte(0x61 + rand.Intn(26)))
	}

	return b.String()
}

func perSecond(end time.Time, start time.Time, n int64) float64 {
	return float64(n) / end.Sub(start).Seconds()
}

func runNTimes(n int64) {
	m := make(map[string]int64)

	startAdd := time.Now()
	for i := int64(0); i < n; i++ {
		m[randomString()]++
	}
	endAdd := time.Now()

	totalInMap := int64(0)
	startRead := time.Now()
	for _, v := range m {
		//绕过未使用的变量错误,
		//v应该始终大于0
		if v != 0 {
			totalInMap++
		} 
	}
	endRead := time.Now()

	fmt.Printf("%10.0f wps :: %10.0f rps\n",
		perSecond(endAdd, startAdd, n),
		perSecond(endRead, startRead, totalInMap),
	)
}
英文:

On my computer I am seeing a read per second drop when I hit certain sizes of maps, but it does not degrade in a linear fashion. In fact, the performance drops off immediately and then comes back slowly as the size increases:

$ go run map.go 425984 1 425985
    273578 wps ::   18488800 rps
    227909 wps ::    1790311 rps

$ go run map.go 400000 10000 500000
    271355 wps ::   18060069 rps
    254804 wps ::   18404288 rps
    267067 wps ::   18673778 rps
    216442 wps ::    1984859 rps
    246724 wps ::    2461281 rps
    282316 wps ::    3634125 rps
    216615 wps ::    4989007 rps
    276769 wps ::    6972233 rps
    212019 wps ::    9756720 rps
    286027 wps ::   14488593 rps
    227073 wps ::   17309822 rps

I expected the writes to slow down occasionally (as the underlying data structure was resized), but the reads being sensitive to size (and by an order of magnitude) surprised me.

Here is the code I am using to test this:

package main

import (
	&quot;bytes&quot;
	&quot;fmt&quot;
	&quot;math/rand&quot;
	&quot;os&quot;
	&quot;strconv&quot;
	&quot;time&quot;
)

func main() {
	start, _ := strconv.ParseInt(os.Args[1], 10, 64)
	step, _ := strconv.ParseInt(os.Args[2], 10, 64)
	end, _ := strconv.ParseInt(os.Args[3], 10, 64)
	for n := start; n &lt;= end; n += step {
		runNTimes(n)
	}
}

func randomString() string {
	var b bytes.Buffer

	for i := 0; i &lt; 16; i++ {
		b.WriteByte(byte(0x61 + rand.Intn(26)))
	}

	return b.String()
}

func perSecond(end time.Time, start time.Time, n int64) float64 {
	return float64(n) / end.Sub(start).Seconds()
}

func runNTimes(n int64) {
	m := make(map[string]int64)

	startAdd := time.Now()
	for i := int64(0); i &lt; n; i++ {
		m[randomString()]++
	}
	endAdd := time.Now()

	totalInMap := int64(0)
	startRead := time.Now()
	for _, v := range m {
		//get around unused variable error,
		//v should always be &gt; 0
		if v != 0 {
			totalInMap++
		} 
	}
	endRead := time.Now()

	fmt.Printf(&quot;%10.0f wps :: %10.0f rps\n&quot;,
		perSecond(endAdd, startAdd, n),
		perSecond(endRead, startRead, totalInMap),
	)
}

答案1

得分: 2

你的代码并不能真正衡量地图性能。你的代码测量了一种奇怪的混合操作,包括随机数生成(不保证是恒定时间操作)、地图范围操作(不保证是恒定时间操作,也不保证与普通地图访问性能有可预测的关系),甚至可能测量了干扰性的“停止-世界”垃圾回收。

  • 不要编写自己的基准功能,使用 testing 包 提供的功能,它更好。例如,它从不依赖样本大小等于 1(就像你的代码错误地做的那样)。
  • 同样,在测量时间之外生成所有测试键。
  • 然后,为了最小化垃圾回收的影响,执行 runtime.GC()
  • 最后,使用 B.StartTimer 并执行要测量的代码路径。

无论如何,你要正确测量的任何内容都不会太有用。地图代码是 Go 运行时的实现细节,随时可能发生变化。据我所知,当前的实现与 Go 1 中的完全不同。

最后:是的,一个经过良好调优的地图实现可能对许多因素敏感,包括其中的项目数量、大小和/或类型,甚至架构、CPU 步进和缓存大小都可能对其起作用。

英文:

Your code does not measure map performance per se. Your code measures a strange mix of performing random number generation (not guaranteed to be constant time operation), ranging a map (not guaranteed to be a constant time operation and not guaranteed to be in any predictable way related to plain map access performance) and perhaps it even measures interfering "stop-the-world" garbage collecting.

  • Don't write your own bench functionality, use what package "testing" offers, it's much better. For example, it never relies on sample size == 1 (like your code wrongly does).
  • Also, generate all the test keys outside of measured time.
  • Then, to minimize the GC effects, perform runtime.GC().
  • Now finaly use B.StartTimer and execute the measured code path.

Anyway, whatever you are going to properly measure won't be too much useful either. The map code is an implementation detail of the Go runtime and can change any time. AFAIK, the current implementation is completely different than what was in Go 1.

And finally: Yes, well tuned map implementation is expected to be potentially sensitive to many things, including the number and/or size and/or type of items in it - even the architecture and CPU stepping and cache size can play a role in this.

huangapple
  • 本文由 发表于 2013年7月28日 23:30:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/17909822.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定