2023年3月7日 01:58:14go评论81阅读模式

英文:

The correct way to manipulate an object positional data in OpenGL, shaders or buffers?

问题

我一直在尝试使用较新的着色器管线功能来学习OpenGL，而不是使用已弃用的立即模式固定管线。在性能和设计方面，我有一些困惑，不确定我所做的是否是“正确”的方式，或者是一种普遍接受的方式。

在较旧的GL版本中，我可以使用glTranslate来移动我的对象在屏幕上，还可以使用矩阵堆栈来推送对象的副本+单独平移每个对象。在较新的GL中，这不再可能，所以我一直在尝试实现类似功能的方法。

我的环境的最小示例：

glGenVertexArrays(1, &_vao);
glBindVertexArray(_vao);

// 居中的正方形
float verts[] = {
    -0.5,    0.5
     0.5,    0.5,
    -0.5,   -0.5,
     0.5,   -0.5,
};

glGenBuffers(1, &_vbo);
glBindBuffer(GL_ARRAY_BUFFER, _vbo);
glBufferData(GL_ARRAY_BUFFER, sizeof(verts), verts, GL_STATIC_DRAW);

int idx[] = {
    0, 1, 2,
    1, 2, 3
};

glGenBuffers(1, &_ebo);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, _ebo);
glBufferData(GL_ELEMENT_ARRAY_BUFFER, sizeof(idx), idx, GL_STATIC_DRAW);

glVertexAttribPointer(0, 2, GL_FLOAT, GL_FALSE, 2 * sizeof(float), 0);
glEnableVertexAttribArray(0);

while (renderLoop) {
    glUseProgram(_program);
    glBindVertexArray(_vao);
    glDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_INT, 0);

    // 交换缓冲区
}

我想要在屏幕上移动我的正方形。我知道它的位置依赖于我传递到数组缓冲区中的顶点，而这些顶点由顶点着色器进行操作。因此，我得出结论，有两个逻辑入口可以用来操作我的数据，直接操作缓冲区或通过着色器。我的困境在于，我不确定哪种方法是正确的，既在性能方面又在可维护性方面。我想从一开始就养成正确的习惯，以免在开发过程中养成不好的习惯，但我不确定应该做什么。

如果我选择直接重写缓冲区，我将不得不在每次渲染时更改它，这可能会成本高昂。我还可能需要将其切换为GL_DYNAMIC_DRAW，因为我经常更改它。但主要好处是我可以单独操作每个点，这可能是有意的。假设在示例中，我想要在鼠标指针处创建我的对象。我需要知道鼠标指针的x和y坐标，然后将它们与规格化的宽度/高度坐标一起缩放，所有这些都需要我重写缓冲区。

while (renderLoop) {
    glUseProgram(_program);
    glBindVertexArray(_vao);

    manipulateVerts(verts);
    glBufferData(GL_ARRAY_BUFFER, sizeof(verts), verts, GL_DYNAMIC_DRAW); // 在渲染循环之前也更改了上面的部分

    glDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_INT, 0);

    // 交换缓冲区
}

我考虑的另一种可能的方法是通过着色器，其中我拥有一个uniform位置变量，并传递坐标给我的着色器。假设我的manipulateVerts函数将x移动了-0.6，y移动了0.4。我可以使用这些值通过uniform vec2传递移动偏移。鉴于顶点着色器旨在操作顶点数据，这似乎更合乎逻辑。然而，我只能独立操作每个像素，如果它们依赖于其他像素以知道它们的新位置，那我就不能做到这一点。这对着色器方法构成了问题。

#version 330 core
layout (location = 0) in vec2 pos;
uniform vec2 offset;

void main() {
    gl_Position = vec4(pos.x + offset.x, pos.y + offset.y, 0.0, 1.0);
}

然后在我的渲染循环中，我可以查找uniform id并更改它。

while (renderLoop) {
    glUseProgram(_program);
    glBindVertexArray(_vao);

    float offset[] = { -0.6, 0.4 };
    unsigned int _offset = glGetUniformLocation(_program, "offset");
    glUniform2fv(_offset, 1, offset);

    glDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_INT, 0);

    // 交换缓冲区
}

这是一个好方法，直到考虑到着色器通常用于运行多个、数百甚至数千个VBO。虽然在我的情况下，一个VBO可以使用这种方法，但多个不同的VBO怎么办？我只需要将偏移设置为0吗？遍历多个VBO并按预期绑定它们，然后操作我的偏移，或者更改顶点会更有意义。如果我有多个VBO的副本，我不能只保持平移它的顶点，因为这会影响其他副本，所以我要么在内存中制作多个副本，浪费大量内存，没有必要的原因。

我认为我最后得出的结论是这取决于具体情况，但我想听听外部意见。我对OpenGL和GLSL着色器相当新，所以我的经验不足可能会影响我判断

英文:

I've been trying to learn OpenGL using the newer shader pipeline functionality over the deprecated immediate mode fixed pipeline gl. There's a couple things I'm confused about both in terms of performance but also in terms of design, whether what I'm doing is the "correct" way, or commonly accepted way to be doing things.

In older versions of GL, I could use glTranslate to manipulate my object across the screen, and use matrix stacks to push copies of my object+translate each one individually. With newer GL this isn't possible, so I've been experimenting with ways to achieve similar functionality.

Minimal example of my environment:

glGenVertexArrays(1, &amp;_vao);
glBindVertexArray(_vao);

// centered square
float verts[] = {
    -0.5,    0.5
     0.5,    0.5,
    -0.5,   -0.5,
     0.5,   -0.5,
};

glGenBuffers(1, &amp;_vbo);
glBindBuffer(GL_ARRAY_BUFFER, _vbo);
glBufferData(GL_ARRAY_BUFFER, sizeof(verts), verts, GL_STATIC_DRAW);

int idx[] = {
    0, 1, 2,
    1, 2, 3
};

glGenBuffers(1, &amp;_ebo);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, _ebo);
glBufferData(GL_ELEMENT_ARRAY_BUFFER, sizeof(idx), idx, GL_STATIC_DRAW);

glVertexAttribPointer(0, 2, GL_FLOAT, GL_FALSE, 2 * sizeof(float), 0);
glEnableVertexAttribArray(0);

while (renderLoop) {
    glUseProgram(_program);
    glBindVertexArray(_vao);
    glDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_INT, 0);

    // swap buffers
}

I want to translate my square around the screen. I'm aware that its position relies on the vertices I passed into my array buffer, and that those are manipulated by the vertex shader. From that, I deduced that there are two logical entrypoints for me to manipulate my data, the buffer directly, or via the shader. My only dilemma is I'm not sure which is the correct way to be doing it, both for performance and maintainability. I want to develop the right habit from the beginning so that I don't end up carrying a bad habit throughout my development, but I'm not sure what I'm meant to do.

If I were to rewrite the buffer directly, I'd have to change it every render tick, which could potentially be costly. I also have to likely switch it to GL_DYNAMIC_DRAW since I'm changing it so often. However the main benefits is that I can manipulate each point separately, something that may be intended. Let's say, in an example, I wanted to create my object at my mouse pointer. I'd need to know the mouse pointer x and y coords, then scale them with normalized width/height coordinates, all of which needs me to rewrite the buffer.

while (renderLoop) {
    glUseProgram(_program);
    glBindVertexArray(_vao);

    manipulateVerts(verts);
    glBufferData(GL_ARRAY_BUFFER, sizeof(verts), verts, GL_DYNAMIC_DRAW); // also changed above, before render loop

    glDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_INT, 0);

    // swap buffers
}

The other possible way I considered was via shaders, where I possess a uniform position variable, and pass in coordinates to my shader. Let's say my manipulateVerts function moved x by -0.6, and y by 0.4. I could use these values to pass movement offsets via uniform vec2. This seems like the more logical thing to do given that a vertex shader is designed to manipulate the vertex data. However I can only manipulate each pixel independently, if they depend on the other for knowing their new position I can't do that. This poses a problem with the shader approach.

#version 330 core
layout (location = 0) in vec2 pos;
uniform vec2 offset;

void main() {
    gl_Position = vec4(pos.x + offset.x, pos.y + offset.y, 0.0, 1.0);
}

Then within my render loop, I could lookup the uniform id, and change it.

while (renderLoop) {
    glUseProgram(_program);
    glBindVertexArray(_vao);

    float offset[] = { -0.6, 0.4 };
    unsigned int _offset = glGetUniformLocation(_program, &quot;offset&quot;);
    glUniform2fv(_offset, 1, offset);

    glDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_INT, 0);

    // swap buffers
}

This is a good way to do it until I consider that shaders are usually meant to run through multiple, hundreds if not thousands of VBOs. Although in my situation one VBO can use this method, what about multiple different VBOs. Would I just need to set offsets to 0? Run through the multiple VBOs and bind them as intended, then manipulate my offset, or would it make more sense to manipulate the vertex. What if I had multiple copies of a VBO, I couldn't just keep translating its vertex since it would affect the others, so I'd either have to make multiple copies in memory, eating up a lot of RAM for a completely unnessecary reason.

I think I'm coming to the conclusion that it simply depends, but I'd like an outside opinion. I'm fairly new to OpenGL and GLSL shaders, so my inexperience may be clouding my ability to see the rational choice.

答案1

得分: 1

首先，不要忘记测量没有替代品。性能规则大致可行，但它们只是告诉你其他开发人员的经验之谈，不是绝对真相，而且它们可能不适用于你的程序。如果你想在任何程序中获得最佳性能，你必须进行测量，改变一些东西，再次进行测量。一次又一次。

你说得对，这取决于具体情况。尽管如此，大多数OpenGL程序都适用相同的经验法则。

如果你在缓冲区中更新顶点坐标，那么... 你必须更新缓冲区中的所有顶点坐标。每个坐标都要单独计算，CPU会计算它们并告诉GPU它们的值。如果你只有少量顶点（可以是相当多的，几千个顶点），或者你有一个真正复杂的效果，无法在着色器中完成（但你会惊讶于着色器可以完成的工作），那就没问题。

但当你有一百万个顶点时，情况就不一样了，你希望GPU来执行这项工作，因为这正是它存在的目的。

如果你有一些不同的对象（可以有好几百个），每个对象都有很多顶点，那么设置偏移量 uniform，绘制一个对象，然后设置 uniform，绘制另一个对象等是有道理的。这通常被接受。我敢打赌，大多数游戏引擎大部分时间都是这样工作的。

顺便说一下，在3D中，更常见的是使用矩阵而不仅仅是偏移量。矩阵允许平移（偏移）、旋转、调整大小和摄像机透视。

你可以在这里停下，因为这就是大多数3D游戏的工作方式。但我已经写了更高级的方法，所以你可以出于好奇而继续阅读...

CPU和GPU之间的通信路径（不仅是PCIe插槽，还有OpenGL驱动程序）并不是特别快。它确实很快，但与GPU可用的原始处理能力相比，它微不足道，GPU的处理能力相当于1996年世界上最快的超级计算机（我实际上查过了，它叫ASCI Red）。当你坚持在CPU上计算所有顶点数据（方法1）时，GPU浪费了99%的时间等待下一个顶点的到来。当你为每个对象发送一个uniform和绘制命令（方法2）时，情况好了很多，但也许还有更好的方法。

如果你需要绘制大量相同的形状，特别是如果对象的顶点不多，只是反复发送uniform和绘制命令可能会浪费太多时间。对于这种情况，你可以使用一种称为“实例化”的能力。它使用一个命令一次绘制多次相同的形状。你可以创建一个偏移量缓冲区，以及一个顶点缓冲区，然后使用实例化绘制命令，然后你的着色器将多次运行相同的顶点，但gl_InstanceID会不同，你可以使用这个变量从偏移量缓冲区获取不同的偏移量。你可能会发现这是一种有用的绘制树木或草叶的方式。

如果你想绘制大量完全不同的形状，你可以使用间接绘制，其中你向GPU提供一个充满“其他绘制命令”的缓冲区。你在向GPU提供绘制命令，而不是glVertexAttribPointer命令，因此所有形状都必须位于同一顶点缓冲区的不同部分。它还支持实例化，因此你可以多次绘制相同的形状。例如（这是一个不使用实例化的示例），你可以创建一个充满级别顶点的缓冲区，然后根据玩家的位置以及他们可以看到的部分告诉GPU要渲染哪些部分。然后，只需要在玩家移动到级别的不同部分时更新该绘制命令缓冲区。

顺便说一下，正如你在间接绘制中所看到的，没有必要使每个形状都有自己的VBO。如果你追求高性能，尽量将不同形状尽可能多地塞入相同的VBO中，以便使用间接绘制。如果你只是使用大多数人使用的旧方法2，你不需要这样做，但如果愿意的话仍然可以这样做。如果你只需要加载一个VBO而不是10000个，这可能会加速加载时间。当然，这也可能不会。需要测量！

P.S. 着色器与VBO无关。它不是“一个着色器=数千个VBO”，它们可以以任何你想要的组合方式一起使用。缓冲区保存数据，着色器处理数据并产生要显示在屏幕上的顶点。甚至它们不一定非要是VBOs - VBO只是指保存顶点数据的缓冲区，但GPU并不知道它保存的是顶点数据。

英文:

First, don't forget there is no substitute for measuring. Performance rules of thumb will get you far, but they're just telling you the collected wisdom of other developers, not the ground truth and they could be wrong for your program. If you want ultimate performance in any program, you have to measure, change something, measure again. Over and over and over.

You are right that it simply depends. That said, the same rules of thumb work for most OpenGL programs.

If you update vertex coordinates in the buffer then... you have to update all the vertex coordinates in the buffer. Each coordinate, individually, the CPU calculates all of them and tells the GPU what they are. This is fine if you just have a handful of vertices (can be a big handful, several thousand vertices), or if you have a really complex effect that just can't be done in a shader (but you'd be surprised what can be done in shaders).

When you have a million vertices, it's not so fine and you want the GPU to do that work, since that's what it's there for.

If you have a handful (that can be several hundred) of different objects, with plenty of vertices each, then it makes sense to set the offset uniform, draw one object, set the uniform, draw another object, etc. This is generally accepted. I'd bet most game engines work this way most of the time.

By the way, in 3D it's a lot more common to use matrices instead of just offsets. Matrices allow for translation (offsetting), rotation, resizing and camera perspective.

You can stop here because it's how most 3D games work. But I already wrote the more advanced ways, so you may as well read on out of curiosity...

The communication path between the CPU and GPU (not just the PCIe slot but also the OpenGL driver) isn't terribly fast. It's fast alright, but it's peanuts compared to the raw processing power the GPU has available, which is as much as the world's fastest supercomputer from 1996 (I actually checked; it's called ASCI Red). When you insist on calculating all the vertex data on the CPU (method 1), the GPU wastes 99% of its time just twiddling its thumbs waiting to hear the next vertex. When you send a single uniform and draw command for each object (method 2) that's a lot better, but maybe it's possible to do even better.

If you have a lot of the same shape to draw, especially if the object doesn't have a lot of vertices, just sending the uniform and draw command over and over can be too much wasted time. For this situation you have an ability called instancing. It draws the same shape many times with one command. You can make a buffer full of offsets, as well as your buffer full of vertices, and use the instanced draw command, and then your shader will run many times on the same vertices, but the gl_InstanceID will be different and you can use this variable to get a different offset from the offset buffer. You might find this to be a useful way to render trees, or blades of grass.

If you want to draw loads and loads of different shapes, you can use indirect drawing, where you feed the GPU a buffer full of other draw commands. You're feeding it draw commands, not glVertexAttribPointer commands, so all the shapes have to be in different parts of the same vertex buffer. It also supports instancing so you can draw lots of one shape. For example (and this is an example without instancing), you could have a buffer full of level vertices, and then you could tell the GPU which parts to render depending on where the player is, and which parts they can see. Then you only need to update that draw command buffer when the player moves to a different part of the level.

By the way, as you can see with indirect drawing, there's no need to make it so that every shape has its own VBO. If you're going for high performance, you may as well stuff different shapes into the same VBO as much as possible so you can use indirect drawing. If you're just using the same old method 2 that everyone uses, you don't need to but you still can, if you want. Might speed up loading times if you have to load 1 VBO instead of 10000. (Then again, it might not. Measure!)

P.S. shaders have nothing to do with VBOs. It's not "one shader = thousands of VBOs", they can go in whatever combination you want. Buffers hold data and shaders process it and churn out vertices to go on the screen. It's not even like they have to be VBOs - a VBO just means a buffer that holds vertex data but it's not like the GPU knows it holds that.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在OpenGL中操纵对象的位置数据的正确方式是使用着色器或缓冲区。

问题

答案1

可以通过模板函数返回指定类的成员变量吗？

为什么 golang gomobile 基本示例为 vec4 属性设置了 3 个浮点数的大小？

在OpenGL中对体素模型进行纹理映射

使用std::vector的emplace与const引用成员

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论