问题

纹理内存是全局内存的一部分。它被缓存且只读。但对于2D模板热问题，许多文献建议使用纹理内存。使用纹理内存会减少所需时间（gpu__time_duration.sum），但内存吞吐量也会增加（dram__bytes.sum.per_second）。为什么内存吞吐量会增加？它最终是从较慢的全局内存中访问数据，所以为什么吞吐量会增加？我找不到任何指向这一点的文献。

英文:

Texture memory is a part of global memory. It is cached and read-only. But for a 2D stencil heat problem, a lot of literature suggests using texture memory. The time taken (gpu__time_duration.sum) decreases on using texture memory, but the memory throughput increases too (dram__bytes.sum.per_second). Why will the memory throughput increase? It is accessing data from the slower global memory at the end, so why will the throughput increase? I can not find any literature pointing to it.

答案1

得分: 1

如评论所述，纹理内存使用与全局内存中的缓存不同：“纹理缓存经过优化，用于2D空间局部性，因此读取靠近2D中的纹理或表面地址的相同warp的线程将获得最佳性能。”

在使用NSight进行性能分析时，您可以访问内存统计信息以了解您的代码如何“使用”缓存。

而且，纹理是只读的事实允许了不同的缓存策略，从而实现了更高的带宽。

因此，加速的程度将取决于您如何访问内存。

英文:

As said in the comment, texture memory is using a different cache than a buffer in global memory : "The texture cache is optimized for 2D spatial locality, so threads of the same warp that read texture or surface addresses that are close together in 2D will achieve best performance."

When profiling with NSight, you could access the Memory Statistics to understand how your code is "using" the caches.

And the fact that texture is read only allows a different caching strategy that allows higher bandwidth.

So the speedup will depend on how you access your memory.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

比全局内存具有更高带宽的纹理内存

问题

答案1

Automake 和 CUDA，包括标志被忽略

OpenGL中的纹理显示为单一颜色吗？

如何调整CUDA内核的SM利用率（跨整个GPU）？

使用Docker的ENTRYPOINT/CMD等效功能构建Singularity配方

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。