英文:
Appending to slice bad performance.. why?
问题
我目前正在使用GoLang创建一个游戏。我正在测量FPS。我注意到使用for循环将元素追加到切片时,会损失大约7个FPS,像这样:
vertexInfo := Opengl.OpenGLVertexInfo{}
for i := 0; i < 4; i = i + 1 {
vertexInfo.Translations = append(vertexInfo.Translations, float32(s.x), float32(s.y), 0)
vertexInfo.Rotations = append(vertexInfo.Rotations, 0, 0, 1, s.rot)
vertexInfo.Scales = append(vertexInfo.Scales, s.xS, s.yS, 0)
vertexInfo.Colors = append(vertexInfo.Colors, s.r, s.g, s.b, s.a)
}
我对每个精灵的每次绘制都这样做。问题是,为什么仅仅循环几次并将相同的内容追加到这些切片中会导致如此大的性能损失?有没有更高效的方法来做这个?并不是说我添加了大量的数据。每个切片只包含大约16个元素,如上所示(4 x 4)。
当我将所有16个元素放在一个[]float32{1..16}
中时,FPS提高了约4个。
更新: 我对每个追加操作进行了基准测试,似乎每个追加操作都需要1个FPS的时间。考虑到这些数据是相当静态的,这似乎是很多的。我只需要4次迭代...
更新: 添加了GitHub存储库https://github.com/Triangle345/GT
英文:
I'm currently creating a game using GoLang. I'm measuring the FPS. I'm noticing about a 7 fps loss using a for loop to append to a slice like so:
vertexInfo := Opengl.OpenGLVertexInfo{}
for i := 0; i < 4; i = i + 1 {
vertexInfo.Translations = append(vertexInfo.Translations, float32(s.x), float32(s.y), 0)
vertexInfo.Rotations = append(vertexInfo.Rotations, 0, 0, 1, s.rot)
vertexInfo.Scales = append(vertexInfo.Scales, s.xS, s.yS, 0)
vertexInfo.Colors = append(vertexInfo.Colors, s.r, s.g, s.b, s.a)
}
I'm doing this for every sprite, every draw. The question is why do I get such a huge performance hit with just looping for times and appending the same thing to these slices? Is there a more efficient way to do this? It is not like I'm adding exuberant amount of data. Each slice contains about 16 elements as shown above (4 x 4).
When I simply put all 16 elements in one []float32{1..16}
then fps is improved by about 4.
Update: I benchmarked each append and it seems that each one takes 1 fps to perform.. That seems like a lot considering this data is pretty static.. I only need 4 iterations...
Update: Added github repo https://github.com/Triangle345/GT
答案1
得分: 6
内置的append()
函数在目标切片的容量小于追加元素后切片长度时,需要创建一个新的支持数组。这还需要将当前元素从目标切片复制到新分配的数组中,因此会有很多开销。
你追加的切片很可能是空切片,因为你使用了切片字面量来创建Opengl.OpenGLVertexInfo
值。尽管append()
会为未来考虑并分配比追加指定元素所需更大的数组,但在你的情况下,很可能需要多次重新分配才能完成4次迭代。
如果你这样创建和初始化vertexInfo
,就可以避免重新分配:
vertexInfo := Opengl.OpenGLVertexInfo{
Translations: []float32{float32(s.x), float32(s.y), 0, float32(s.x), float32(s.y), 0, float32(s.x), float32(s.y), 0, float32(s.x), float32(s.y), 0},
Rotations: []float64{0, 0, 1, s.rot, 0, 0, 1, s.rot, 0, 0, 1, s.rot, 0, 0, 1, s.rot},
Scales: []float64{s.xS, s.yS, 0, s.xS, s.yS, 0, s.xS, s.yS, 0, s.xS, s.yS, 0},
Colors: []float64{s.r, s.g, s.b, s.a, s.r, s.g, s.b, s.a, s.r, s.g, s.b, s.a, s.r, s.g, s.b, s.a},
}
还要注意,这个结构体字面量将处理切片后面不需要重新分配数组的情况。但是,如果在代码的其他地方(我们看不到)你向这些切片追加更多元素,它们可能会导致重新分配。如果是这种情况,你应该创建容量更大的切片以覆盖“未来”分配(例如make([]float64, 16, 32)
)。
英文:
The builtin append()
needs to create a new backing array if the capacity of the destination slice is less than what the length of the slice would be after the append. This also requires to copy the current elements from destination to the newly allocated array, so there are much overhead.
Slices you append to are most likely empty slices since you used a slice literal to create your Opengl.OpenGLVertexInfo
value. Even though append()
thinks for the future and allocates a bigger array than what is needed to append the specified elements, chances are that in your case multiple reallocations will be needed to complete the 4 iterations.
You may avoid reallocations if you create and initialize vertexInfo
like this:
vertexInfo := Opengl.OpenGLVertexInfo{
Translations: []float32{float32(s.x), float32(s.y), 0, float32(s.x), float32(s.y), 0, float32(s.x), float32(s.y), 0, float32(s.x), float32(s.y), 0},
Rotations: []float64{0, 0, 1, s.rot, 0, 0, 1, s.rot, 0, 0, 1, s.rot, 0, 0, 1, s.rot},
Scales: []float64{s.xS, s.yS, 0, s.xS, s.yS, 0, s.xS, s.yS, 0, s.xS, s.yS, 0},
Colors: []float64{s.r, s.g, s.b, s.a, s.r, s.g, s.b, s.a, s.r, s.g, s.b, s.a, s.r, s.g, s.b, s.a},
}
Also note that this struct literal will take care of not having to reallocate arrays behind the slices. But if in other places of your code (which we don't see) you append further elements to these slices, they may cause reallocations. If this is the case, you should create slices with bigger capacity covering "future" allocations (e.g. make([]float64, 16, 32)
).
答案2
得分: 4
一个空的切片是空的。要进行追加操作,它必须分配内存。然后,你进行更多的追加操作,这将需要分配更多的内存。
为了加快速度,可以使用固定大小的数组或使用make
函数创建具有正确长度的切片,或者在声明时初始化切片的元素。
英文:
An empty slice is empty. To append, it must allocate memory. And then you do more appends, which have to allocate even more memory.
To speed it up use a fixed size array or use make
to create a slice with the correct length, or initialize the slice with the items when you declare it.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论