使用Metal计算着色器的共享内存(如何访问共享线程组内存之外的数据?)

huangapple go评论72阅读模式
英文:

Shared memory with Metal compute shaders (how to access data outside of shared thread group memory?)

问题

我已经编写了一个Metal计算着色器,它可以:

  • 接受一个二维Metal纹理
  • 向两个Metal纹理写入数据
    • 一个带有垂直动态模糊
    • 一个带有水平动态模糊

实现代码如下:

constexpr sampler s(coord::pixel, address::clamp_to_edge);

kernel void motion_blur(texture2d<half, access::sample> gray_source [[ texture(0) ]],
                        texture2d<half, access::write> b_hor [[ texture(1) ]],
                        texture2d<half, access::write> b_ver [[ texture(2) ]],
                        uint2 globalId [[ thread_position_in_grid ]]) {
    float2 c = static_cast<float2>(globalId);
    
    // 计算动态模糊图像的值。
    half filter_len = 15;
    int lower_bound = int(floor(filter_len / 2.0)) * -1.0;
    int upper_bound = int(floor(filter_len / 2.0) + 1.0);
    half g_x = 0;
    half g_y = 0;
    for (int i = lower_bound; i < upper_bound; i++) {
        half xGray = gray_source.sample(s, c, int2(i, 0)).x / filter_len;
        g_x += xGray;
        half yGray = gray_source.sample(s, c, int2(0, i)).x / filter_len;
        g_y += yGray;
    }
    b_hor.write(g_x, globalId);
    b_ver.write(g_y, globalId);
}

上面,滤波器长度被设定为15,但我需要一个大约30的滤波器长度。

通过实验,我发现大于15的滤波器长度并没有像我期望的那样增加生成的动态模糊量。

我不确定,但我有一种直觉认为这与我的线程组大小有关:

threadgroupSize = MTLSize(width: 16, height: 16, depth: 1)

但将宽度和高度增加到32也没有产生期望的效果。我猜测这可能是由于硬件限制。

我对着色器编程和Metal的经验都很有限。我该怎么做才能让每个线程能够访问到纹理的更大部分呢?

英文:

I've written a Metal compute shader that:

  • accepts a 2-D metal texture
  • writes to two metal textures
    • one with vertical motion blur
    • one with horizontal motion blur

implementation:

constexpr sampler s(coord::pixel, address::clamp_to_edge);

kernel void motion_blur(texture2d<half, access::sample> gray_source [[ texture(0) ]],
                        texture2d<half, access::write> b_hor [[ texture(1) ]],
                        texture2d<half, access::write> b_ver [[ texture(2) ]],
                        uint2 globalId [[ thread_position_in_grid ]]) {
    float2 c = static_cast<float2>(globalId);
    
    // calculate the value of a motion-blurred image.
    half filter_len = 15;
    int lower_bound = int(floor(filter_len / 2.0)) * -1.0;
    int upper_bound = int(floor(filter_len / 2.0) + 1.0);
    half g_x = 0;
    half g_y = 0;
    for (int i = lower_bound; i < upper_bound; i++) {
        half xGray = gray_source.sample(s, c, int2(i, 0)).x / filter_len;
        g_x += xGray;
        half yGray = gray_source.sample(s, c, int2(0, i)).x / filter_len;
        g_y += yGray;
    }
    b_hor.write(g_x, globalId);
    b_ver.write(g_y, globalId);
}

Above, the filter length is set to 15, but I need a filter length of ~30.

Experimentally, filter lengths of greater than 15 do not increase the amount of motion blur generated as I would expect.

I'm not sure, but I have a hunch that this is related to my threadgroup size:

threadgroupSize = MTLSize(width: 16, height: 16, depth: 1)

but increasing the width and height to 32 does not have the desired effect either. I'm guessing that's due to hardware limitations.

I'm new to shader programming, and have even less experience with Metal. What can I do to give each thread access to larger portions of the texture?

答案1

得分: 0

由于一位非常友好且乐于助人的工程师提供的指针,我现在知道我的问题所在了。在下面这行代码中,我滥用了采样器的偏移量:

half xGray = gray_source.sample(s, c, int2(i, 0)).x / filter_len;

我猜只有从-8到7的值才能正常工作。我正在查阅Metal着色语言规范,如果找到了,我会回报的。尽管如此,将下面的代码更新为以下方式也可以正常工作:

half xGray = gray_source.sample(s, float2(c.x + i, c.y)).x / filter_len;

我只是对指定我想要采样的坐标的正确方式感到困惑。

英文:

Thanks to a pointer from a very helpful and friendly engineer, I now know what my issue was. In the following line, I was abusing the offset of the sampler:

half xGray = gray_source.sample(s, c, int2(i, 0)).x / filter_len;

I guess only values from -8 to 7 will work. I'm looking through the metal shading language spec for this and will report back if I find it. That said, updating the line as below works just fine:

half xGray = gray_source.sample(s, float2(c.x + i, c.y)).x / filter_len;

I was just confused about the correct way to specify the coordinate I wanted to sample.

huangapple
  • 本文由 发表于 2023年6月30日 04:58:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/76584578.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定