追加到切片的性能较差..为什么?

huangapple go评论66阅读模式
英文:

Appending to slice bad performance.. why?

问题

我目前正在使用GoLang创建一个游戏。我正在测量FPS。我注意到使用for循环将元素追加到切片时,会损失大约7个FPS,像这样:

vertexInfo := Opengl.OpenGLVertexInfo{}

for i := 0; i < 4; i = i + 1 {
	vertexInfo.Translations = append(vertexInfo.Translations, float32(s.x), float32(s.y), 0)
	vertexInfo.Rotations = append(vertexInfo.Rotations, 0, 0, 1, s.rot)
	vertexInfo.Scales = append(vertexInfo.Scales, s.xS, s.yS, 0)
	vertexInfo.Colors = append(vertexInfo.Colors, s.r, s.g, s.b, s.a)

}

我对每个精灵的每次绘制都这样做。问题是,为什么仅仅循环几次并将相同的内容追加到这些切片中会导致如此大的性能损失?有没有更高效的方法来做这个?并不是说我添加了大量的数据。每个切片只包含大约16个元素,如上所示(4 x 4)。

当我将所有16个元素放在一个[]float32{1..16}中时,FPS提高了约4个。

更新: 我对每个追加操作进行了基准测试,似乎每个追加操作都需要1个FPS的时间。考虑到这些数据是相当静态的,这似乎是很多的。我只需要4次迭代...

更新: 添加了GitHub存储库https://github.com/Triangle345/GT

英文:

I'm currently creating a game using GoLang. I'm measuring the FPS. I'm noticing about a 7 fps loss using a for loop to append to a slice like so:

vertexInfo := Opengl.OpenGLVertexInfo{}

for i := 0; i &lt; 4; i = i + 1 {
	vertexInfo.Translations = append(vertexInfo.Translations, float32(s.x), float32(s.y), 0)
	vertexInfo.Rotations = append(vertexInfo.Rotations, 0, 0, 1, s.rot)
	vertexInfo.Scales = append(vertexInfo.Scales, s.xS, s.yS, 0)
	vertexInfo.Colors = append(vertexInfo.Colors, s.r, s.g, s.b, s.a)

}

I'm doing this for every sprite, every draw. The question is why do I get such a huge performance hit with just looping for times and appending the same thing to these slices? Is there a more efficient way to do this? It is not like I'm adding exuberant amount of data. Each slice contains about 16 elements as shown above (4 x 4).

When I simply put all 16 elements in one []float32{1..16} then fps is improved by about 4.

Update: I benchmarked each append and it seems that each one takes 1 fps to perform.. That seems like a lot considering this data is pretty static.. I only need 4 iterations...

Update: Added github repo https://github.com/Triangle345/GT

答案1

得分: 6

内置的append()函数在目标切片的容量小于追加元素后切片长度时,需要创建一个新的支持数组。这还需要将当前元素从目标切片复制到新分配的数组中,因此会有很多开销。

你追加的切片很可能是空切片,因为你使用了切片字面量来创建Opengl.OpenGLVertexInfo值。尽管append()会为未来考虑并分配比追加指定元素所需更大的数组,但在你的情况下,很可能需要多次重新分配才能完成4次迭代。

如果你这样创建和初始化vertexInfo,就可以避免重新分配:

vertexInfo := Opengl.OpenGLVertexInfo{
	Translations: []float32{float32(s.x), float32(s.y), 0, float32(s.x), float32(s.y), 0, float32(s.x), float32(s.y), 0, float32(s.x), float32(s.y), 0},
	Rotations:    []float64{0, 0, 1, s.rot, 0, 0, 1, s.rot, 0, 0, 1, s.rot, 0, 0, 1, s.rot},
	Scales:       []float64{s.xS, s.yS, 0, s.xS, s.yS, 0, s.xS, s.yS, 0, s.xS, s.yS, 0},
	Colors:       []float64{s.r, s.g, s.b, s.a, s.r, s.g, s.b, s.a, s.r, s.g, s.b, s.a, s.r, s.g, s.b, s.a},
}

还要注意,这个结构体字面量将处理切片后面不需要重新分配数组的情况。但是,如果在代码的其他地方(我们看不到)你向这些切片追加更多元素,它们可能会导致重新分配。如果是这种情况,你应该创建容量更大的切片以覆盖“未来”分配(例如make([]float64, 16, 32))。

英文:

The builtin append() needs to create a new backing array if the capacity of the destination slice is less than what the length of the slice would be after the append. This also requires to copy the current elements from destination to the newly allocated array, so there are much overhead.

Slices you append to are most likely empty slices since you used a slice literal to create your Opengl.OpenGLVertexInfo value. Even though append() thinks for the future and allocates a bigger array than what is needed to append the specified elements, chances are that in your case multiple reallocations will be needed to complete the 4 iterations.

You may avoid reallocations if you create and initialize vertexInfo like this:

vertexInfo := Opengl.OpenGLVertexInfo{
	Translations: []float32{float32(s.x), float32(s.y), 0, float32(s.x), float32(s.y), 0, float32(s.x), float32(s.y), 0, float32(s.x), float32(s.y), 0},
	Rotations:    []float64{0, 0, 1, s.rot, 0, 0, 1, s.rot, 0, 0, 1, s.rot, 0, 0, 1, s.rot},
	Scales:       []float64{s.xS, s.yS, 0, s.xS, s.yS, 0, s.xS, s.yS, 0, s.xS, s.yS, 0},
	Colors:       []float64{s.r, s.g, s.b, s.a, s.r, s.g, s.b, s.a, s.r, s.g, s.b, s.a, s.r, s.g, s.b, s.a},
}

Also note that this struct literal will take care of not having to reallocate arrays behind the slices. But if in other places of your code (which we don't see) you append further elements to these slices, they may cause reallocations. If this is the case, you should create slices with bigger capacity covering "future" allocations (e.g. make([]float64, 16, 32)).

答案2

得分: 4

一个空的切片是空的。要进行追加操作,它必须分配内存。然后,你进行更多的追加操作,这将需要分配更多的内存。

为了加快速度,可以使用固定大小的数组或使用make函数创建具有正确长度的切片,或者在声明时初始化切片的元素。

英文:

An empty slice is empty. To append, it must allocate memory. And then you do more appends, which have to allocate even more memory.

To speed it up use a fixed size array or use make to create a slice with the correct length, or initialize the slice with the items when you declare it.

huangapple
  • 本文由 发表于 2015年8月28日 13:54:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/32264208.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定