GPU (Nvidia) TLB misses

huangapple go评论44阅读模式
英文:

GPU (Nvidia) TLB misses

问题

  1. GPU是否使用类似于CPU的TLB(Translation Lookaside Buffer),因此会出现TLB命中/未命中现象吗?

  2. TLB未命中是由CUDA驱动程序还是GPU硬件处理的?

  3. 是否存在TLB未命中导致显著/明显的性能影响的情况?

英文:

There are plenty of documentation/publications on CUDA/Nvidia GPUs, but I never encountered anything about TLBs.

  1. Do GPUs use TLBs similar to CPUs (and, therefore, have TLB hits/misses)?

  2. How are TLB misses handled? By CUDA driver or by GPU HW?

  3. Are there cases when TLB misses cause significant/noticeable performance impact?

答案1

得分: 1

一个TLB确实存在。我不知道任何官方文件,但可以通过逆向工程确定其大小。例如,参见Zhe Jia等人的文章:通过微基准测试解剖NVidia Turing T4 GPU

在可用的全局内存大小范围内,图灵架构的GPU有两个级别的TLB。L1 TLB有2 MiB页条目和32 MiB的覆盖范围。L2 TLB的覆盖范围约为8192 MiB,与Volta相同。

英文:

A TLB does exist. I am not aware of any official documentation but its size can be determined via reverse engineering. See for example Zhe Jia et.al.: Dissecting the NVidia Turing T4 GPU via Microbenchmarking

> […] within the available global memory size, there are
two levels of TLB on the Turing GPUs. The L1 TLB has 2 MiB page entries and
32 MiB coverage. The coverage of the L2 TLB is about 8192 MiB, which is the
same as Volta.

huangapple
  • 本文由 发表于 2023年2月14日 01:56:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/75439577.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定