2023年6月30日 04:58:41go评论96阅读模式

英文:

Shared memory with Metal compute shaders (how to access data outside of shared thread group memory?)

问题

我已经编写了一个Metal计算着色器，它可以：

接受一个二维Metal纹理
向两个Metal纹理写入数据
- 一个带有垂直动态模糊
- 一个带有水平动态模糊

实现代码如下：

constexpr sampler s(coord::pixel, address::clamp_to_edge);
kernel void motion_blur(texture2d&lt;half, access::sample&gt; gray_source [[ texture(0) ]],
                        texture2d&lt;half, access::write&gt; b_hor [[ texture(1) ]],
                        texture2d&lt;half, access::write&gt; b_ver [[ texture(2) ]],
                        uint2 globalId [[ thread_position_in_grid ]]) {
    float2 c = static_cast&lt;float2&gt;(globalId);
    
    // 计算动态模糊图像的值。
    half filter_len = 15;
    int lower_bound = int(floor(filter_len / 2.0)) * -1.0;
    int upper_bound = int(floor(filter_len / 2.0) + 1.0);
    half g_x = 0;
    half g_y = 0;
    for (int i = lower_bound; i &lt; upper_bound; i++) {
        half xGray = gray_source.sample(s, c, int2(i, 0)).x / filter_len;
        g_x += xGray;
        half yGray = gray_source.sample(s, c, int2(0, i)).x / filter_len;
        g_y += yGray;
    }
    b_hor.write(g_x, globalId);
    b_ver.write(g_y, globalId);
}

上面，滤波器长度被设定为15，但我需要一个大约30的滤波器长度。

通过实验，我发现大于15的滤波器长度并没有像我期望的那样增加生成的动态模糊量。

我不确定，但我有一种直觉认为这与我的线程组大小有关：

threadgroupSize = MTLSize(width: 16, height: 16, depth: 1)

但将宽度和高度增加到32也没有产生期望的效果。我猜测这可能是由于硬件限制。

我对着色器编程和Metal的经验都很有限。我该怎么做才能让每个线程能够访问到纹理的更大部分呢？

英文:

I've written a Metal compute shader that:

accepts a 2-D metal texture
writes to two metal textures
- one with vertical motion blur
- one with horizontal motion blur

implementation:

constexpr sampler s(coord::pixel, address::clamp_to_edge);
kernel void motion_blur(texture2d&lt;half, access::sample&gt; gray_source [[ texture(0) ]],
                        texture2d&lt;half, access::write&gt; b_hor [[ texture(1) ]],
                        texture2d&lt;half, access::write&gt; b_ver [[ texture(2) ]],
                        uint2 globalId [[ thread_position_in_grid ]]) {
    float2 c = static_cast&lt;float2&gt;(globalId);
    
    // calculate the value of a motion-blurred image.
    half filter_len = 15;
    int lower_bound = int(floor(filter_len / 2.0)) * -1.0;
    int upper_bound = int(floor(filter_len / 2.0) + 1.0);
    half g_x = 0;
    half g_y = 0;
    for (int i = lower_bound; i &lt; upper_bound; i++) {
        half xGray = gray_source.sample(s, c, int2(i, 0)).x / filter_len;
        g_x += xGray;
        half yGray = gray_source.sample(s, c, int2(0, i)).x / filter_len;
        g_y += yGray;
    }
    b_hor.write(g_x, globalId);
    b_ver.write(g_y, globalId);
}

Above, the filter length is set to 15, but I need a filter length of ~30.

Experimentally, filter lengths of greater than 15 do not increase the amount of motion blur generated as I would expect.

I'm not sure, but I have a hunch that this is related to my threadgroup size:

threadgroupSize = MTLSize(width: 16, height: 16, depth: 1)

but increasing the width and height to 32 does not have the desired effect either. I'm guessing that's due to hardware limitations.

I'm new to shader programming, and have even less experience with Metal. What can I do to give each thread access to larger portions of the texture?

答案1

得分: 0

由于一位非常友好且乐于助人的工程师提供的指针，我现在知道我的问题所在了。在下面这行代码中，我滥用了采样器的偏移量：

half xGray = gray_source.sample(s, c, int2(i, 0)).x / filter_len;

我猜只有从-8到7的值才能正常工作。我正在查阅Metal着色语言规范，如果找到了，我会回报的。尽管如此，将下面的代码更新为以下方式也可以正常工作：

half xGray = gray_source.sample(s, float2(c.x + i, c.y)).x / filter_len;

我只是对指定我想要采样的坐标的正确方式感到困惑。

英文:

Thanks to a pointer from a very helpful and friendly engineer, I now know what my issue was. In the following line, I was abusing the offset of the sampler:

half xGray = gray_source.sample(s, c, int2(i, 0)).x / filter_len;

I guess only values from -8 to 7 will work. I'm looking through the metal shading language spec for this and will report back if I find it. That said, updating the line as below works just fine:

half xGray = gray_source.sample(s, float2(c.x + i, c.y)).x / filter_len;

I was just confused about the correct way to specify the coordinate I wanted to sample.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用Metal计算着色器的共享内存（如何访问共享线程组内存之外的数据？）

问题

答案1

光线追踪器非常嘈杂

如何在Go中快速缩放和锐化图像？

重新着色图像

Crop image opencv python

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。