英文:
GPU (Nvidia) TLB misses
问题
-
GPU是否使用类似于CPU的TLB(Translation Lookaside Buffer),因此会出现TLB命中/未命中现象吗?
-
TLB未命中是由CUDA驱动程序还是GPU硬件处理的?
-
是否存在TLB未命中导致显著/明显的性能影响的情况?
英文:
There are plenty of documentation/publications on CUDA/Nvidia GPUs, but I never encountered anything about TLBs.
-
Do GPUs use TLBs similar to CPUs (and, therefore, have TLB hits/misses)?
-
How are TLB misses handled? By CUDA driver or by GPU HW?
-
Are there cases when TLB misses cause significant/noticeable performance impact?
答案1
得分: 1
一个TLB确实存在。我不知道任何官方文件,但可以通过逆向工程确定其大小。例如,参见Zhe Jia等人的文章:通过微基准测试解剖NVidia Turing T4 GPU
在可用的全局内存大小范围内,图灵架构的GPU有两个级别的TLB。L1 TLB有2 MiB页条目和32 MiB的覆盖范围。L2 TLB的覆盖范围约为8192 MiB,与Volta相同。
英文:
A TLB does exist. I am not aware of any official documentation but its size can be determined via reverse engineering. See for example Zhe Jia et.al.: Dissecting the NVidia Turing T4 GPU via Microbenchmarking
> […] within the available global memory size, there are
two levels of TLB on the Turing GPUs. The L1 TLB has 2 MiB page entries and
32 MiB coverage. The coverage of the L2 TLB is about 8192 MiB, which is the
same as Volta.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论