Go切片调整大小的不同性能表现

huangapple go评论76阅读模式
英文:

Different performances in Go slices resize

问题

我正在花时间研究Go语言的内部机制,并且我写了一个使用切片实现的自己的。正如一个Reddit用户在这篇帖子中正确指出的,并且另一个用户在这个SO答案中概述的那样,Go语言已经尝试优化切片的调整大小。

然而,事实证明,与使用默认的切片调整大小相比,我自己实现的切片增长方法更具性能优势。

这是我用于保存栈的结构:

type Stack struct {
    slice []interface{}
    blockSize int
}

const s_DefaultAllocBlockSize = 20;

这是我自己实现的Push方法:

func (s *Stack) Push(elem interface{}) {
    if len(s.slice) + 1 == cap(s.slice) {
        slice := make([]interface{}, 0, len(s.slice) + s.blockSize)
        copy(slice, s.slice)
        s.slice = slice
    }

    s.slice = append(s.slice, elem)
}

这是普通的实现:

func (s *Stack) Push(elem interface{}) {
    s.slice = append(s.slice, elem)
}

通过使用Go语言的testing包实现的基准测试,我的自己实现的性能如下:

Benchmark_PushDefaultStack	20000000	        87.7 ns/op	      24 B/op	       1 allocs/op

而依赖于普通的append的结果如下:

Benchmark_PushDefaultStack	10000000	       209 ns/op	      90 B/op	       1 allocs/op

我在一台2011年早期的Mac Book Pro上运行测试,配置为2.3 GHz Intel Core i5处理器,8GB 1333MHz DDR3内存。

编辑
实际问题是:我的实现真的比默认的append行为更快吗?还是我没有考虑到某些因素?

英文:

I'm spending some time experimenting with Go's internals and I ended up writing my own implementation of a stack using slices.
As correctly pointed out by a reddit user in this post and as outlined by another user in this SO answer Go already tries to optimise slices resize.

Turns out, though, that I rather have performance gains using my own implementation of slice growing rather than sticking with the default one.

This is the structure I use for holding the stack:

type Stack struct {
    slice []interface{}
    blockSize int
}

const s_DefaultAllocBlockSize = 20;

This is my own implementation of the Push method

func (s *Stack) Push(elem interface{}) {
    if len(s.slice) + 1 == cap(s.slice) {
	    slice := make([]interface{}, 0, len(s.slice) + s.blockSize)
	    copy(slice, s.slice)
	    s.slice = slice
    }

    s.slice = append(s.slice, elem)
}

This is a plain implementation

func (s *Stack) Push(elem interface{}) {
    s.slice = append(s.slice, elem)
}

Running the benchmarks I've implemented using Go's testing package my own implementation performs this way:

Benchmark_PushDefaultStack	20000000	        87.7 ns/op	      24 B/op	       1 allocs/op

While relying on the plain append the results are the following

Benchmark_PushDefaultStack	10000000	       209 ns/op	      90 B/op	       1 allocs/op

The machine I run tests on is an early 2011 Mac Book Pro, 2.3 GHz Intel Core i5 with 8GB of RAM 1333MHz DDR3

EDIT
The actual question is: is my implementation really faster than the default append behavior? Or am I not taking something into account?

答案1

得分: 4

阅读您的代码、测试、基准测试和结果后,很容易看出它们存在缺陷。完整的代码审查超出了StackOverflow的范围。

一个具体的错误。

// Push pushes a new element to the stack
func (s *Stack) Push(elem interface{}) {
    if len(s.slice)+1 == cap(s.slice) {
        slice := make([]interface{}, 0, len(s.slice)+s.blockSize)
        copy(slice, s.slice)
        s.slice = slice
    }

    s.slice = append(s.slice, elem)
}

应该改为

// Push pushes a new element to the stack
func (s *Stack) Push(elem interface{}) {
    if len(s.slice)+1 == cap(s.slice) {
        slice := make([]interface{}, len(s.slice), len(s.slice)+s.blockSize)
        copy(slice, s.slice)
        s.slice = slice
    }

    s.slice = append(s.slice, elem)
}

复制切片

函数copy将切片元素从源src复制到目标dst,并返回复制的元素数。复制的元素数是len(src)len(dst)中的最小值。

您复制了0,应该复制len(s.slice)

如预期的那样,您的Push算法非常慢:

append:

Benchmark_PushDefaultStack-4	 2000000	       941 ns/op	      49 B/op	       1 allocs/op

alediaferia:

Benchmark_PushDefaultStack-4	  100000	   1246315 ns/op	   42355 B/op	       1 allocs/op

这是append的工作方式:append复杂度

还有其他问题。您的基准测试结果通常无效。

英文:

Reading your code, tests, benchmarks, and results it's easy to see that they are flawed. A full code review is beyond the scope of StackOverflow.

One specific bug.

// Push pushes a new element to the stack
func (s *Stack) Push(elem interface{}) {
	if len(s.slice)+1 == cap(s.slice) {
		slice := make([]interface{}, 0, len(s.slice)+s.blockSize)
		copy(slice, s.slice)
		s.slice = slice
	}

	s.slice = append(s.slice, elem)
}

Should be

// Push pushes a new element to the stack
func (s *Stack) Push(elem interface{}) {
	if len(s.slice)+1 == cap(s.slice) {
		slice := make([]interface{}, len(s.slice), len(s.slice)+s.blockSize)
		copy(slice, s.slice)
		s.slice = slice
	}

	s.slice = append(s.slice, elem)
}

> copying
> slices

>
> The function copy copies slice elements from a source src to a
> destination dst and returns the number of elements copied. The
> number of elements copied is the minimum of len(src) and len(dst).

You copied 0, you should have copied len(s.slice).

As expected, your Push algorithm is inordinately slow:

append:

Benchmark_PushDefaultStack-4	 2000000	       941 ns/op	      49 B/op	       1 allocs/op

alediaferia:

Benchmark_PushDefaultStack-4	  100000	   1246315 ns/op	   42355 B/op	       1 allocs/op

This how append works: append complexity.

There are other things wrong too. Your benchmark results are often not valid.

答案2

得分: 3

我相信你的例子更快,因为你的数据集相当小,并且在分配时初始容量为0。在你的append版本中,通过更大幅度地提前增加块大小(增加20),你可以避免(在这种情况下)昂贵的realloc操作,这些操作会使你经历所有那些微不足道的小容量0、1、2、4、8、16、32、64等。

如果你的数据集要大得多,这个差距可能会被大规模复制的成本所抵消。我在Go语言中见过很多对切片的误用。明显的性能优势是通过使用合理的默认容量来获得的。

英文:

I believe your example is faster because you have a fairly small data set and are allocating with an initial capacity of 0. In your version of append you preempt a large amount of allocations by growing the block size more dramatically early (by 20) circumventing the (in this case) expensive reallocs that take you through all those trivially small capacities 0,1,2,4,8,16,32,64 ect

If your data sets were a lot larger this would likely be marginalized by the cost of large copies. I've seen a lot of misuse of slice in Go. The clear performance win is had by making your slice with a reasonable default capacity.

huangapple
  • 本文由 发表于 2015年6月16日 05:24:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/30855207.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定