2015年6月16日 05:24:55go评论76阅读模式

英文:

Different performances in Go slices resize

问题

我正在花时间研究Go语言的内部机制，并且我写了一个使用切片实现的自己的栈。正如一个Reddit用户在这篇帖子中正确指出的，并且另一个用户在这个SO答案中概述的那样，Go语言已经尝试优化切片的调整大小。

然而，事实证明，与使用默认的切片调整大小相比，我自己实现的切片增长方法更具性能优势。

这是我用于保存栈的结构：

type Stack struct {
    slice []interface{}
    blockSize int
}

const s_DefaultAllocBlockSize = 20;

这是我自己实现的Push方法：

func (s *Stack) Push(elem interface{}) {
    if len(s.slice) + 1 == cap(s.slice) {
        slice := make([]interface{}, 0, len(s.slice) + s.blockSize)
        copy(slice, s.slice)
        s.slice = slice
    }

    s.slice = append(s.slice, elem)
}

这是普通的实现：

func (s *Stack) Push(elem interface{}) {
    s.slice = append(s.slice, elem)
}

通过使用Go语言的testing包实现的基准测试，我的自己实现的性能如下：

Benchmark_PushDefaultStack	20000000	        87.7 ns/op	      24 B/op	       1 allocs/op

而依赖于普通的append的结果如下：

Benchmark_PushDefaultStack	10000000	       209 ns/op	      90 B/op	       1 allocs/op

我在一台2011年早期的Mac Book Pro上运行测试，配置为2.3 GHz Intel Core i5处理器，8GB 1333MHz DDR3内存。

编辑
实际问题是：我的实现真的比默认的append行为更快吗？还是我没有考虑到某些因素？

英文:

I'm spending some time experimenting with Go's internals and I ended up writing my own implementation of a stack using slices.
As correctly pointed out by a reddit user in this post and as outlined by another user in this SO answer Go already tries to optimise slices resize.

Turns out, though, that I rather have performance gains using my own implementation of slice growing rather than sticking with the default one.

This is the structure I use for holding the stack:

type Stack struct {
    slice []interface{}
    blockSize int
}

const s_DefaultAllocBlockSize = 20;

This is my own implementation of the Push method

func (s *Stack) Push(elem interface{}) {
    if len(s.slice) + 1 == cap(s.slice) {
	    slice := make([]interface{}, 0, len(s.slice) + s.blockSize)
	    copy(slice, s.slice)
	    s.slice = slice
    }

    s.slice = append(s.slice, elem)
}

This is a plain implementation

func (s *Stack) Push(elem interface{}) {
    s.slice = append(s.slice, elem)
}

Running the benchmarks I've implemented using Go's testing package my own implementation performs this way:

Benchmark_PushDefaultStack	20000000	        87.7 ns/op	      24 B/op	       1 allocs/op

While relying on the plain append the results are the following

Benchmark_PushDefaultStack	10000000	       209 ns/op	      90 B/op	       1 allocs/op

The machine I run tests on is an early 2011 Mac Book Pro, 2.3 GHz Intel Core i5 with 8GB of RAM 1333MHz DDR3

EDIT
The actual question is: is my implementation really faster than the default append behavior? Or am I not taking something into account?

答案1

得分: 4

阅读您的代码、测试、基准测试和结果后，很容易看出它们存在缺陷。完整的代码审查超出了StackOverflow的范围。

一个具体的错误。

// Push pushes a new element to the stack
func (s *Stack) Push(elem interface{}) {
    if len(s.slice)+1 == cap(s.slice) {
        slice := make([]interface{}, 0, len(s.slice)+s.blockSize)
        copy(slice, s.slice)
        s.slice = slice
    }

    s.slice = append(s.slice, elem)
}

应该改为

// Push pushes a new element to the stack
func (s *Stack) Push(elem interface{}) {
    if len(s.slice)+1 == cap(s.slice) {
        slice := make([]interface{}, len(s.slice), len(s.slice)+s.blockSize)
        copy(slice, s.slice)
        s.slice = slice
    }

    s.slice = append(s.slice, elem)
}

复制切片

函数copy将切片元素从源src复制到目标dst，并返回复制的元素数。复制的元素数是len(src)和len(dst)中的最小值。

您复制了0，应该复制len(s.slice)。

如预期的那样，您的Push算法非常慢：

append:

Benchmark_PushDefaultStack-4	 2000000	       941 ns/op	      49 B/op	       1 allocs/op

alediaferia:

Benchmark_PushDefaultStack-4	  100000	   1246315 ns/op	   42355 B/op	       1 allocs/op

这是append的工作方式：append复杂度。

还有其他问题。您的基准测试结果通常无效。

英文:

Reading your code, tests, benchmarks, and results it's easy to see that they are flawed. A full code review is beyond the scope of StackOverflow.

One specific bug.

// Push pushes a new element to the stack
func (s *Stack) Push(elem interface{}) {
	if len(s.slice)+1 == cap(s.slice) {
		slice := make([]interface{}, 0, len(s.slice)+s.blockSize)
		copy(slice, s.slice)
		s.slice = slice
	}

	s.slice = append(s.slice, elem)
}

Should be

// Push pushes a new element to the stack
func (s *Stack) Push(elem interface{}) {
	if len(s.slice)+1 == cap(s.slice) {
		slice := make([]interface{}, len(s.slice), len(s.slice)+s.blockSize)
		copy(slice, s.slice)
		s.slice = slice
	}

	s.slice = append(s.slice, elem)
}

> copying
> slices
>
> The function copy copies slice elements from a source src to a
> destination dst and returns the number of elements copied. The
> number of elements copied is the minimum of len(src) and len(dst).

You copied 0, you should have copied len(s.slice).

As expected, your Push algorithm is inordinately slow:

append:

Benchmark_PushDefaultStack-4	 2000000	       941 ns/op	      49 B/op	       1 allocs/op

alediaferia:

Benchmark_PushDefaultStack-4	  100000	   1246315 ns/op	   42355 B/op	       1 allocs/op

This how append works: append complexity.

There are other things wrong too. Your benchmark results are often not valid.

答案2

得分: 3

我相信你的例子更快，因为你的数据集相当小，并且在分配时初始容量为0。在你的append版本中，通过更大幅度地提前增加块大小（增加20），你可以避免（在这种情况下）昂贵的realloc操作，这些操作会使你经历所有那些微不足道的小容量0、1、2、4、8、16、32、64等。

如果你的数据集要大得多，这个差距可能会被大规模复制的成本所抵消。我在Go语言中见过很多对切片的误用。明显的性能优势是通过使用合理的默认容量来获得的。

英文:

I believe your example is faster because you have a fairly small data set and are allocating with an initial capacity of 0. In your version of append you preempt a large amount of allocations by growing the block size more dramatically early (by 20) circumventing the (in this case) expensive reallocs that take you through all those trivially small capacities 0,1,2,4,8,16,32,64 ect

If your data sets were a lot larger this would likely be marginalized by the cost of large copies. I've seen a lot of misuse of slice in Go. The clear performance win is had by making your slice with a reasonable default capacity.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Go切片调整大小的不同性能表现

问题

答案1

答案2

在 Mac 上运行 IP4 服务器

Golang SSH隧道连接到远程Postgres数据库

How can I iterate over a pair of strings rune by rune in Go?

Get the first and last day of current month in Go/Golang?

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论