英文:
Different performances in Go slices resize
问题
我正在花时间研究Go语言的内部机制,并且我写了一个使用切片实现的自己的栈。正如一个Reddit用户在这篇帖子中正确指出的,并且另一个用户在这个SO答案中概述的那样,Go语言已经尝试优化切片的调整大小。
然而,事实证明,与使用默认的切片调整大小相比,我自己实现的切片增长方法更具性能优势。
这是我用于保存栈的结构:
type Stack struct {
slice []interface{}
blockSize int
}
const s_DefaultAllocBlockSize = 20;
这是我自己实现的Push
方法:
func (s *Stack) Push(elem interface{}) {
if len(s.slice) + 1 == cap(s.slice) {
slice := make([]interface{}, 0, len(s.slice) + s.blockSize)
copy(slice, s.slice)
s.slice = slice
}
s.slice = append(s.slice, elem)
}
这是普通的实现:
func (s *Stack) Push(elem interface{}) {
s.slice = append(s.slice, elem)
}
通过使用Go语言的testing包实现的基准测试,我的自己实现的性能如下:
Benchmark_PushDefaultStack 20000000 87.7 ns/op 24 B/op 1 allocs/op
而依赖于普通的append
的结果如下:
Benchmark_PushDefaultStack 10000000 209 ns/op 90 B/op 1 allocs/op
我在一台2011年早期的Mac Book Pro上运行测试,配置为2.3 GHz Intel Core i5处理器,8GB 1333MHz DDR3内存。
编辑
实际问题是:我的实现真的比默认的append
行为更快吗?还是我没有考虑到某些因素?
英文:
I'm spending some time experimenting with Go's internals and I ended up writing my own implementation of a stack using slices.
As correctly pointed out by a reddit user in this post and as outlined by another user in this SO answer Go already tries to optimise slices resize.
Turns out, though, that I rather have performance gains using my own implementation of slice growing rather than sticking with the default one.
This is the structure I use for holding the stack:
type Stack struct {
slice []interface{}
blockSize int
}
const s_DefaultAllocBlockSize = 20;
This is my own implementation of the Push
method
func (s *Stack) Push(elem interface{}) {
if len(s.slice) + 1 == cap(s.slice) {
slice := make([]interface{}, 0, len(s.slice) + s.blockSize)
copy(slice, s.slice)
s.slice = slice
}
s.slice = append(s.slice, elem)
}
This is a plain implementation
func (s *Stack) Push(elem interface{}) {
s.slice = append(s.slice, elem)
}
Running the benchmarks I've implemented using Go's testing package my own implementation performs this way:
Benchmark_PushDefaultStack 20000000 87.7 ns/op 24 B/op 1 allocs/op
While relying on the plain append
the results are the following
Benchmark_PushDefaultStack 10000000 209 ns/op 90 B/op 1 allocs/op
The machine I run tests on is an early 2011 Mac Book Pro, 2.3 GHz Intel Core i5 with 8GB of RAM 1333MHz DDR3
EDIT
The actual question is: is my implementation really faster than the default append behavior? Or am I not taking something into account?
答案1
得分: 4
阅读您的代码、测试、基准测试和结果后,很容易看出它们存在缺陷。完整的代码审查超出了StackOverflow的范围。
一个具体的错误。
// Push pushes a new element to the stack
func (s *Stack) Push(elem interface{}) {
if len(s.slice)+1 == cap(s.slice) {
slice := make([]interface{}, 0, len(s.slice)+s.blockSize)
copy(slice, s.slice)
s.slice = slice
}
s.slice = append(s.slice, elem)
}
应该改为
// Push pushes a new element to the stack
func (s *Stack) Push(elem interface{}) {
if len(s.slice)+1 == cap(s.slice) {
slice := make([]interface{}, len(s.slice), len(s.slice)+s.blockSize)
copy(slice, s.slice)
s.slice = slice
}
s.slice = append(s.slice, elem)
}
函数
copy
将切片元素从源src
复制到目标dst
,并返回复制的元素数。复制的元素数是len(src)
和len(dst)
中的最小值。
您复制了0
,应该复制len(s.slice)
。
如预期的那样,您的Push算法非常慢:
append:
Benchmark_PushDefaultStack-4 2000000 941 ns/op 49 B/op 1 allocs/op
alediaferia:
Benchmark_PushDefaultStack-4 100000 1246315 ns/op 42355 B/op 1 allocs/op
这是append
的工作方式:append复杂度。
还有其他问题。您的基准测试结果通常无效。
英文:
Reading your code, tests, benchmarks, and results it's easy to see that they are flawed. A full code review is beyond the scope of StackOverflow.
One specific bug.
// Push pushes a new element to the stack
func (s *Stack) Push(elem interface{}) {
if len(s.slice)+1 == cap(s.slice) {
slice := make([]interface{}, 0, len(s.slice)+s.blockSize)
copy(slice, s.slice)
s.slice = slice
}
s.slice = append(s.slice, elem)
}
Should be
// Push pushes a new element to the stack
func (s *Stack) Push(elem interface{}) {
if len(s.slice)+1 == cap(s.slice) {
slice := make([]interface{}, len(s.slice), len(s.slice)+s.blockSize)
copy(slice, s.slice)
s.slice = slice
}
s.slice = append(s.slice, elem)
}
> copying
> slices
>
> The function copy
copies slice elements from a source src
to a
> destination dst
and returns the number of elements copied. The
> number of elements copied is the minimum of len(src)
and len(dst)
.
You copied 0
, you should have copied len(s.slice)
.
As expected, your Push algorithm is inordinately slow:
append:
Benchmark_PushDefaultStack-4 2000000 941 ns/op 49 B/op 1 allocs/op
alediaferia:
Benchmark_PushDefaultStack-4 100000 1246315 ns/op 42355 B/op 1 allocs/op
This how append
works: append complexity.
There are other things wrong too. Your benchmark results are often not valid.
答案2
得分: 3
我相信你的例子更快,因为你的数据集相当小,并且在分配时初始容量为0。在你的append版本中,通过更大幅度地提前增加块大小(增加20),你可以避免(在这种情况下)昂贵的realloc操作,这些操作会使你经历所有那些微不足道的小容量0、1、2、4、8、16、32、64等。
如果你的数据集要大得多,这个差距可能会被大规模复制的成本所抵消。我在Go语言中见过很多对切片的误用。明显的性能优势是通过使用合理的默认容量来获得的。
英文:
I believe your example is faster because you have a fairly small data set and are allocating with an initial capacity of 0. In your version of append you preempt a large amount of allocations by growing the block size more dramatically early (by 20) circumventing the (in this case) expensive reallocs that take you through all those trivially small capacities 0,1,2,4,8,16,32,64 ect
If your data sets were a lot larger this would likely be marginalized by the cost of large copies. I've seen a lot of misuse of slice in Go. The clear performance win is had by making your slice with a reasonable default capacity.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论