比全局内存具有更高带宽的纹理内存

huangapple go评论68阅读模式
英文:

Higher bandwidth in texture memory compared to global memory

问题

纹理内存是全局内存的一部分。它被缓存且只读。但对于2D模板热问题,许多文献建议使用纹理内存。使用纹理内存会减少所需时间(gpu__time_duration.sum),但内存吞吐量也会增加(dram__bytes.sum.per_second)。为什么内存吞吐量会增加?它最终是从较慢的全局内存中访问数据,所以为什么吞吐量会增加?我找不到任何指向这一点的文献。

英文:

Texture memory is a part of global memory. It is cached and read-only. But for a 2D stencil heat problem, a lot of literature suggests using texture memory. The time taken (gpu__time_duration.sum) decreases on using texture memory, but the memory throughput increases too (dram__bytes.sum.per_second). Why will the memory throughput increase? It is accessing data from the slower global memory at the end, so why will the throughput increase? I can not find any literature pointing to it.

答案1

得分: 1

如评论所述,纹理内存使用与全局内存中的缓存不同:“纹理缓存经过优化,用于2D空间局部性,因此读取靠近2D中的纹理或表面地址的相同warp的线程将获得最佳性能。”

在使用NSight进行性能分析时,您可以访问内存统计信息以了解您的代码如何“使用”缓存。

而且,纹理是只读的事实允许了不同的缓存策略,从而实现了更高的带宽。

因此,加速的程度将取决于您如何访问内存。

英文:

As said in the comment, texture memory is using a different cache than a buffer in global memory : "The texture cache is optimized for 2D spatial locality, so threads of the same warp that read texture or surface addresses that are close together in 2D will achieve best performance."

When profiling with NSight, you could access the Memory Statistics to understand how your code is "using" the caches.

And the fact that texture is read only allows a different caching strategy that allows higher bandwidth.

So the speedup will depend on how you access your memory.

huangapple
  • 本文由 发表于 2023年3月7日 23:08:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/75663728.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定