英文:
how to return an object count from compute shader
问题
我已经通过在DX12上使用HLSL中的计算着色器与间接渲染一起实现了遮挡剔除。我想要将已剔除对象的计数传回CPU以输出到控制台。
我目前的代码大部分是通过查看现有示例来实现的,我不太了解执行这个操作的最佳方法。我看到了一些像InterlockedAdd这样的东西,但不确定是否应该采用这种方法。
我的当前代码如下(省略了剔除的详细信息):
SamplerState DepthSampler : register(s0);
StructuredBuffer<IndirectCommand> inputCommands : register(t0); // SRV: Indirect commands
StructuredBuffer<VSIndirectConstants> indirectConstants : register(t1); // SRV: of per-object constants
StructuredBuffer<TransformData> TransformBuffer : register(t2); // SRV: transforms (per object)
Texture2D<float> DepthTexture : register(t3);
AppendStructuredBuffer<IndirectCommand> outputCommands : register(u0); // UAV: Processed indirect commands
bool isOccluded(uint index)
{
bool occluded = false;
uint transformIndex = indirectConstants[index].transformIndex;
TransformData tData = TransformBuffer[transformIndex];
VSIndirectConstants constants = indirectConstants[index];
// ...
}
[numthreads(threadBlockSize, 1, 1)]
void main(uint3 groupId : SV_GroupID, uint groupIndex : SV_GroupIndex)
{
// 每个CS线程处理一个间接命令。
uint index = (groupId.x * threadBlockSize) + groupIndex;
// 如果分配了比命令更多的线程,请不要尝试访问不存在的命令。
if (index < (uint)commandCount)
{
if (isWithinFrustum(index) && !isOccluded(index))
{
outputCommands.Append(inputCommands[index]);
}
}
}
如果我剔除了一个对象,我只想在某处增加一个计数器,并能够有效地从CPU读取它。对于这个问题的解决方法,希望能提供一些方法建议,不需要太多的代码细节。
英文:
I've implemented occlusion culling via a compute shader in conjunction with indirect rendering in hlsl on DX12.
I would like to get back a count of the number of objects that have been culled to the CPU for output to console.
The code I have has mostly been achieved by looking at existing examples, and I'm not really aware of the best methods for doing what I guess is a reduction. I've seen things like InterlockedAdd but don't know if that's the route to take either..
My current code looks like this (details of culling omitted):
SamplerState DepthSampler : register(s0);
StructuredBuffer<IndirectCommand> inputCommands : register(t0); // SRV: Indirect commands
StructuredBuffer<VSIndirectConstants> indirectConstants : register(t1); // SRV: of per-object constants
StructuredBuffer<TransformData> TransformBuffer : register(t2); // SRV: transforms (per object)
Texture2D<float> DepthTexture : register(t3);
AppendStructuredBuffer<IndirectCommand> outputCommands : register(u0); // UAV: Processed indirect commands
bool isOccluded(uint index)
{
bool occluded = false;
uint transformIndex = indirectConstants[index].transformIndex;
TransformData tData = TransformBuffer[transformIndex];
VSIndirectConstants constants = indirectConstants[index];
...
}
[numthreads(threadBlockSize, 1, 1)]
void main(uint3 groupId : SV_GroupID, uint groupIndex : SV_GroupIndex)
{
// Each thread of the CS operates on one of the indirect commands.
uint index = (groupId.x * threadBlockSize) + groupIndex;
// Don't attempt to access commands that don't exist if more threads are allocated
// than commands.
if (index < (uint)commandCount)
{
if (isWithinFrustum(index) && !isOccluded(index))
{
outputCommands.Append(inputCommands[index]);
}
}
}
I'd just like to increment a counter somewhere if I've culled an object and be able to read it back from the CPU efficiently.
Would appreciate suggestions on the approach to take, shouldn't need too much detail code-wise.
答案1
得分: 1
- 创建一个格式为
R32_UINT
的缓冲区,并创建其UAV
。 - 在着色器中,使用
InterlockedAdd
进行递增,就像你提到的那样。 - 你还要创建一个具有相同格式的读取缓冲区。你需要创建与你使用的后备缓冲区数量相同的这些缓冲区(或者只创建一个足够大以包含所有“子缓冲区”的缓冲区)。
- 然后,使用
CopyBufferRegion
在原始缓冲区和读取缓冲区之间进行复制。 - 最后,如果你尚未映射读取缓冲区,然后读取
uint32
。
这只是大致概述,没有深入涉及 API 的详细信息。
你在评论中提到的使用 UAV
计数器的方法也是可能的,但并不会让你的工作更容易。你仍然需要使用 R32_UINT
格式创建缓冲区(这将成为你的计数器缓冲区),不同之处在于,当你使用 CreateUnorderedAccessView
创建 outputCommands
的 UAV
时,pCounterResource
参数将不是 nullptr
。但如果你想在 CPU 上读取它,该过程的其他部分仍然相同。
英文:
Some rough sketch would be like this:
- You create a buffer with format
R32_UINT
and you create it'sUAV
. - In shader you increment it using
InterlockedAdd
as you mentioned. - You also create readback buffer with the same format. You will need to create them as much as backbuffers you are using (Or just create one buffer large enough to contain all "sub-buffers"
- You then use
CopyBufferRegion
to copy between your original buffer and readback buffer. - At the end, you map the readback buffer if you havent already and then you read the
uint32
.
This is without going into too much of API details.
The approach you mentioned in the comments with UAV
counter is also possible but it doesn't make your life much easier. You still need to create the buffer with R32_UINT
format (which will be your counter buffer), only this time when you create UAV
of your outputCommands
with CreateUnorderedAccessView
the pCounterResource
argument won't be nullptr
. You will save yourself writing InterlockedAdd
code though. However the other part of the procedure will be the same if you want to read it on CPU
.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论