如何从计算着色器返回对象计数

huangapple go评论45阅读模式
英文:

how to return an object count from compute shader

问题

我已经通过在DX12上使用HLSL中的计算着色器与间接渲染一起实现了遮挡剔除。我想要将已剔除对象的计数传回CPU以输出到控制台。

我目前的代码大部分是通过查看现有示例来实现的,我不太了解执行这个操作的最佳方法。我看到了一些像InterlockedAdd这样的东西,但不确定是否应该采用这种方法。

我的当前代码如下(省略了剔除的详细信息):

SamplerState DepthSampler                                  : register(s0);
StructuredBuffer<IndirectCommand> inputCommands            : register(t0);      // SRV: Indirect commands
StructuredBuffer<VSIndirectConstants> indirectConstants    : register(t1);      // SRV: of per-object constants
StructuredBuffer<TransformData> TransformBuffer            : register(t2);      // SRV: transforms (per object)
Texture2D<float> DepthTexture                              : register(t3);
AppendStructuredBuffer<IndirectCommand> outputCommands     : register(u0);      // UAV: Processed indirect commands

bool isOccluded(uint index)
{
    bool occluded = false;
    uint transformIndex = indirectConstants[index].transformIndex;
    TransformData tData = TransformBuffer[transformIndex];
    VSIndirectConstants constants = indirectConstants[index];
    // ...
}

[numthreads(threadBlockSize, 1, 1)]
void main(uint3 groupId : SV_GroupID, uint groupIndex : SV_GroupIndex)
{
    // 每个CS线程处理一个间接命令。
    uint index = (groupId.x * threadBlockSize) + groupIndex;

    // 如果分配了比命令更多的线程,请不要尝试访问不存在的命令。
    if (index < (uint)commandCount)
    {
        if (isWithinFrustum(index) && !isOccluded(index))
        {
            outputCommands.Append(inputCommands[index]);
        }                    
    }
}

如果我剔除了一个对象,我只想在某处增加一个计数器,并能够有效地从CPU读取它。对于这个问题的解决方法,希望能提供一些方法建议,不需要太多的代码细节。

英文:

I've implemented occlusion culling via a compute shader in conjunction with indirect rendering in hlsl on DX12.

I would like to get back a count of the number of objects that have been culled to the CPU for output to console.

The code I have has mostly been achieved by looking at existing examples, and I'm not really aware of the best methods for doing what I guess is a reduction. I've seen things like InterlockedAdd but don't know if that's the route to take either..

My current code looks like this (details of culling omitted):

SamplerState DepthSampler                                  : register(s0);
StructuredBuffer&lt;IndirectCommand&gt; inputCommands            : register(t0);      // SRV: Indirect commands
StructuredBuffer&lt;VSIndirectConstants&gt; indirectConstants    : register(t1);      // SRV: of per-object constants
StructuredBuffer&lt;TransformData&gt; TransformBuffer            : register(t2);      // SRV: transforms (per object)
Texture2D&lt;float&gt; DepthTexture                              : register(t3);
AppendStructuredBuffer&lt;IndirectCommand&gt; outputCommands     : register(u0);      // UAV: Processed indirect commands

bool isOccluded(uint index)
{
    bool occluded = false;
    uint transformIndex = indirectConstants[index].transformIndex;
    TransformData tData = TransformBuffer[transformIndex];
    VSIndirectConstants constants = indirectConstants[index];
    ...
}

[numthreads(threadBlockSize, 1, 1)]
void main(uint3 groupId : SV_GroupID, uint groupIndex : SV_GroupIndex)
{
    // Each thread of the CS operates on one of the indirect commands.
    uint index = (groupId.x * threadBlockSize) + groupIndex;

    // Don&#39;t attempt to access commands that don&#39;t exist if more threads are allocated
    // than commands.
    if (index &lt; (uint)commandCount)
    {
            if (isWithinFrustum(index) &amp;&amp; !isOccluded(index))
            {
                outputCommands.Append(inputCommands[index]);
            }                    
    }
}

I'd just like to increment a counter somewhere if I've culled an object and be able to read it back from the CPU efficiently.
Would appreciate suggestions on the approach to take, shouldn't need too much detail code-wise.

答案1

得分: 1

  1. 创建一个格式为 R32_UINT 的缓冲区,并创建其 UAV
  2. 在着色器中,使用 InterlockedAdd 进行递增,就像你提到的那样。
  3. 你还要创建一个具有相同格式的读取缓冲区。你需要创建与你使用的后备缓冲区数量相同的这些缓冲区(或者只创建一个足够大以包含所有“子缓冲区”的缓冲区)。
  4. 然后,使用 CopyBufferRegion 在原始缓冲区和读取缓冲区之间进行复制。
  5. 最后,如果你尚未映射读取缓冲区,然后读取 uint32

这只是大致概述,没有深入涉及 API 的详细信息。

你在评论中提到的使用 UAV 计数器的方法也是可能的,但并不会让你的工作更容易。你仍然需要使用 R32_UINT 格式创建缓冲区(这将成为你的计数器缓冲区),不同之处在于,当你使用 CreateUnorderedAccessView 创建 outputCommandsUAV 时,pCounterResource 参数将不是 nullptr。但如果你想在 CPU 上读取它,该过程的其他部分仍然相同。

英文:

Some rough sketch would be like this:

  1. You create a buffer with format R32_UINT and you create it's UAV.
  2. In shader you increment it using InterlockedAdd as you mentioned.
  3. You also create readback buffer with the same format. You will need to create them as much as backbuffers you are using (Or just create one buffer large enough to contain all "sub-buffers"
  4. You then use CopyBufferRegion to copy between your original buffer and readback buffer.
  5. At the end, you map the readback buffer if you havent already and then you read the uint32.

This is without going into too much of API details.

The approach you mentioned in the comments with UAV counter is also possible but it doesn't make your life much easier. You still need to create the buffer with R32_UINT format (which will be your counter buffer), only this time when you create UAV of your outputCommands with CreateUnorderedAccessView the pCounterResource argument won't be nullptr. You will save yourself writing InterlockedAdd code though. However the other part of the procedure will be the same if you want to read it on CPU.

huangapple
  • 本文由 发表于 2023年5月22日 16:23:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/76304283.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定