Go:多次 len() 调用与性能?

huangapple go评论109阅读模式
英文:

Go: multiple len() calls vs performance?

问题

目前我正在实现一些排序算法。由于算法的特性,会对一些数组/切片的长度进行多次调用,使用 len() 方法。

现在,给出以下代码,是归并排序算法的一部分:

for len(left) > 0 || len(right) > 0 {
    if len(left) > 0 && len(right) > 0 {
        if left[0] <= right[0] {
            result = append(result, left[0])
            left = left[1:len(left)]
        } else {
            result = append(result, right[0])
            right = right[1:len(right)]
        }
    } else if len(left) > 0 {
        result = append(result, left[0])
        left = left[1:len(left)]
    } else if len(right) > 0 {
        result = append(result, right[0])
        right = right[1:len(right)]
    }
}

我的问题是:这些多次的 len() 调用会对算法的性能产生负面影响吗?是不是最好为 rightleft 切片创建一个临时变量来存储长度?或者编译器会自动处理这个问题?

英文:

At the moment I am implementing some sorting algorithms. As it's in the nature of algorithms, there are a lot of calls on the length of some arrays/slices using the len() method.

Now, given the following code for a (part of) the Mergesort algorithm:

  for len(left) &gt; 0 || len(right) &gt; 0 {
		if len(left) &gt; 0 &amp;&amp; len(right) &gt; 0 {
			if left[0] &lt;= right[0] {
				result = append(result, left[0])
				left = left[1:len(left)]
			} else {
				result = append(result, right[0])
				right = right[1:len(right)]
			}
		} else if len(left) &gt; 0 {
			result = append(result, left[0])
			left = left[1:len(left)]
		} else if len(right) &gt; 0 {
			result = append(result, right[0])
			right = right[1:len(right)]
		}
	}

My question is: Do these multiple len() calls affect the performance of the algorithm negatively? Is it better to make a temporary variable for the length of the right and left slice? Or does the compiler does this itself?

答案1

得分: 78

有两种情况:

  • **本地切片:**长度将被缓存,没有额外开销
  • 全局切片或通过引用传递:长度无法缓存,存在开销

本地切片没有额外开销

对于在本地定义的切片,长度会被缓存,因此没有运行时开销。您可以在以下程序的汇编代码中看到这一点:

func generateSlice(x int) []int {
	return make([]int, x)
}

func main() {
	x := generateSlice(10)
	println(len(x))
}

使用go tool 6g -S test.go编译,可以得到以下行,以及其他内容:

MOVQ	&quot;&quot;.x+40(SP),BX
MOVQ	BX,(SP)
// ...
CALL	,runtime.printint(SB)

这里发生的情况是,第一行通过获取距离x开头40字节处的值来检索x的长度,并且最重要的是将此值缓存在BX中,然后用于每次出现的len(x)。偏移量的原因是数组具有以下结构(来源):

typedef	struct
{				// must not move anything
    uchar	array[8];	// pointer to data
    uchar	nel[4];		// number of elements
    uchar	cap[4];		// allocated number of elements
} Array;

nel是由len()访问的内容。您还可以在代码生成中看到这一点。

全局和引用切片有开销

对于共享值,无法缓存长度,因为编译器必须假设切片在调用之间发生了变化。因此,编译器必须编写直接访问长度属性的代码。例如:

func accessLocal() int {
	a := make([]int, 1000) // local
	count := 0
	for i := 0; i &lt; len(a); i++ {
		count += len(a)
	}
	return count
}

var ag = make([]int, 1000) // pseudo-code

func accessGlobal() int {
	count := 0
	for i := 0; i &lt; len(ag); i++ {
		count += len(ag)
	}
	return count
}

比较这两个函数的汇编代码可以得出一个关键的区别,即一旦变量是全局的,对nel属性的访问将不再被缓存,从而导致运行时开销:

// accessLocal
MOVQ	&quot;&quot;.a+8048(SP),SI // 在 SI 中缓存长度
// ...
CMPQ	SI,AX            // i &lt; len(a)
// ...
MOVQ	SI,BX
ADDQ	CX,BX
MOVQ	BX,CX            // count += len(a)

// accessGlobal
MOVQ	&quot;&quot;.ag+8(SB),BX
CMPQ	BX,AX            // i &lt; len(ag)
// ...
MOVQ	&quot;&quot;.ag+8(SB),BX
ADDQ	CX,BX
MOVQ	BX,CX            // count += len(ag)
英文:

There are two cases:

  • Local slice: length will be cached and there is no overhead
  • Global slice or passed (by reference): length cannot be cached and there is overhead

No overhead for local slices

For locally defined slices the length is cached, so there is no runtime overhead. You can see this in the assembly of the following program:

func generateSlice(x int) []int {
	return make([]int, x)
}

func main() {
	x := generateSlice(10)
	println(len(x))
}

Compiled with go tool 6g -S test.go this yields, amongst other things, the following lines:

MOVQ	&quot;&quot;.x+40(SP),BX
MOVQ	BX,(SP)
// ...
CALL	,runtime.printint(SB)

What happens here is that the first line retrieves the length of x by getting the value located 40 bytes from the beginning of x and most importantly caches this value in BX, which is then used for every occurrence of len(x). The reason for the offset is that an array has the following structure (source):

typedef	struct
{				// must not move anything
    uchar	array[8];	// pointer to data
    uchar	nel[4];		// number of elements
    uchar	cap[4];		// allocated number of elements
} Array;

nel is what is accessed by len(). You can see this in the code generation as well.

Global and referenced slices have overhead

For shared values caching of the length is not possible since the compiler has to assume that the slice changes between calls. Therefore the compiler has to write code that accesses the length attribute directly every time. Example:

func accessLocal() int {
	a := make([]int, 1000) // local
	count := 0
	for i := 0; i &lt; len(a); i++ {
		count += len(a)
	}
	return count
}

var ag = make([]int, 1000) // pseudo-code

func accessGlobal() int {
	count := 0
	for i := 0; i &lt; len(ag); i++ {
		count += len(ag)
	}
	return count
}

Comparing the assembly of both functions yields the crucial difference that as soon as the variable is global the access to the nel attribute is not cached anymore and there will be a runtime overhead:

// accessLocal
MOVQ	&quot;&quot;.a+8048(SP),SI // cache length in SI
// ...
CMPQ	SI,AX            // i &lt; len(a)
// ...
MOVQ	SI,BX
ADDQ	CX,BX
MOVQ	BX,CX            // count += len(a)

// accessGlobal
MOVQ	&quot;&quot;.ag+8(SB),BX
CMPQ	BX,AX            // i &lt; len(ag)
// ...
MOVQ	&quot;&quot;.ag+8(SB),BX
ADDQ	CX,BX
MOVQ	BX,CX            // count += len(ag)

答案2

得分: 11

尽管你得到了很好的答案,但如果不断调用len(a),我的性能会变差,例如在这个测试中:http://play.golang.org/p/fiP1Sy2Hfk

当运行go test -bench=.时,我得到的结果是:

BenchmarkTest1 5000000 668 ns/op
BenchmarkTest2 5000000 402 ns/op

所以显然存在一个惩罚,可能是因为编译器在编译时做出了更差的优化。

英文:

Despite the good answers you are getting, I'm getting poorer performance if calling len(a) constantly, for example in this test http://play.golang.org/p/fiP1Sy2Hfk

package main

import &quot;testing&quot;

func BenchmarkTest1(b *testing.B) {
	a := make([]int, 1000)
	for i := 0; i &lt; b.N; i++ {
		count := 0
		for i := 0; i &lt; len(a); i++ {
			count += len(a)
		}
	}
}

func BenchmarkTest2(b *testing.B) {
	a := make([]int, 1000)
	for i := 0; i &lt; b.N; i++ {
		count := 0
		lena := len(a)
		for i := 0; i &lt; lena; i++ {
			count += lena
		}
	}
}

When run as go test -bench=. I get:

BenchmarkTest1   5000000               668 ns/op
BenchmarkTest2   5000000               402 ns/op

So there is clearly a penalty here, possibly because the compiler is making worse optimizations in compile-time.

答案3

得分: 1

希望在最新版本的Go中有所改进

go版本 go1.16.7 linux/amd64

go操作系统:linux
go架构:amd64
包:001_test
处理器:第11代Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
BenchmarkTest1-8 4903609 228.8 ns/op
BenchmarkTest2-8 5280086 229.9 ns/op

英文:

Hope things got improved in the latest version of Go

go version go1.16.7 linux/amd64

goos: linux
goarch: amd64
pkg: 001_test
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
BenchmarkTest1-8   	 4903609	       228.8 ns/op
BenchmarkTest2-8   	 5280086	       229.9 ns/op

huangapple
  • 本文由 发表于 2014年10月29日 23:36:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/26634554.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定