英文:
Higher bandwidth in texture memory compared to global memory
问题
纹理内存是全局内存的一部分。它被缓存且只读。但对于2D模板热问题,许多文献建议使用纹理内存。使用纹理内存会减少所需时间(gpu__time_duration.sum),但内存吞吐量也会增加(dram__bytes.sum.per_second)。为什么内存吞吐量会增加?它最终是从较慢的全局内存中访问数据,所以为什么吞吐量会增加?我找不到任何指向这一点的文献。
英文:
Texture memory is a part of global memory. It is cached and read-only. But for a 2D stencil heat problem, a lot of literature suggests using texture memory. The time taken (gpu__time_duration.sum) decreases on using texture memory, but the memory throughput increases too (dram__bytes.sum.per_second). Why will the memory throughput increase? It is accessing data from the slower global memory at the end, so why will the throughput increase? I can not find any literature pointing to it.
答案1
得分: 1
如评论所述,纹理内存使用与全局内存中的缓存不同:“纹理缓存经过优化,用于2D空间局部性,因此读取靠近2D中的纹理或表面地址的相同warp的线程将获得最佳性能。”
在使用NSight进行性能分析时,您可以访问内存统计信息以了解您的代码如何“使用”缓存。
而且,纹理是只读的事实允许了不同的缓存策略,从而实现了更高的带宽。
因此,加速的程度将取决于您如何访问内存。
英文:
As said in the comment, texture memory is using a different cache than a buffer in global memory : "The texture cache is optimized for 2D spatial locality, so threads of the same warp that read texture or surface addresses that are close together in 2D will achieve best performance."
When profiling with NSight, you could access the Memory Statistics to understand how your code is "using" the caches.
And the fact that texture is read only allows a different caching strategy that allows higher bandwidth.
So the speedup will depend on how you access your memory.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论