Does Intel Cache Allocation Technology allow hits from CPUs in one group on cache lines in another group?

huangapple go评论52阅读模式
英文:

Does Intel Cache Allocation Technology allow hits from CPUs in one group on cache lines in another group?

问题

根据MESI协议,如果需要将一个缓存行加载到缓存中,CPU将发出PrRd指令。根据缓存行是否已经存在于另一个缓存中,将会发出BusRd指令。其他缓存将随后看到BusRd指令,并检查它们是否拥有有效的副本。如果是的话,该缓存将发送该值。

现在,英特尔CAT(Cache Allocation Technology)提供了一种将LLC缓存使用隔离到不同CPU的方式。例如,CPU1使用前8路,CPU2使用接下来的8路。我的问题是:如果现在CPU1需要加载CPU2缓存中存在的缓存行,CPU2是否会发送该副本而不是从主存加载?

英文:

In MESI protocol, if a cacheline needs to be loaded into a cache, the CPU will issue a PrRd. Depending whether the cacheline is already in another cache, a BusRd is issued. And other caches will then see the BusRd and check if they have a valid copy. If yes, this cache will send value.

Now intel CAT (Cache Allocation Technology) provides a way to isolate LLC cache usage to different CPUs. For example, CPU1 uses the first 8 ways and CPU2 uses the next 8 ways. My question is: if now CPU1 needs to load a cacheline which is in CPU2's cache, will CPU2 send that copy instead of loading from main memory?

答案1

得分: 2

是的。CAT不是NUMA的一种形式,地址空间仍然是共享的。这只是一种微架构特性,帮助您控制缓存占用,以便线程之间的干扰较少(或者根据您分配掩码的方式获得更多的缓存机会)。

如果您不从另一个线程分区返回数据,您将失去一致性(如果该行已被修改会怎样?在这种情况下,无法从内存返回陈旧的数据)。可以这样考虑 - 每个线程可以从整个缓存中查找,但只能分配给他的分区(这可以通过修改LRU和受害者选择来轻松实现)。这样,您可以完全控制私有行,只有共享行会放在首次访问它们的线程的分区中。足够接近以获得所需的QoS。

一个尚未解决的实施问题可能是 - 如果您在一个分区中分配一行,但只能由另一个线程继续使用它,会发生什么。它是否会最终迁移到另一个分区?我猜测不会,仅仅因为检测和组织它太过麻烦。

英文:

Yes. CAT is not a form of NUMA, the address space is still shared.
It's just a micro-architectural feature that helps you control the cache occupancy so threads will have less interferences with each other (or have access to more caching opportunities, depending on how you allocate the masks).

If you don't return data from the other thread partition you will lose your coherence (what if the line is modified? you can't return stale data from the memory in that case).
Think of it like this - each thread can lookup from the entire cache, but allocate only to his partition (this could easily be implemented by hacking the LRU and victim selection).
This way you get full control over private lines, and only shared lines will be placed in the partition of whichever thread accessed them first. Close enough to get what QoS was needed for.

One open implementation question could be this - what happens what you allocate a line in one partition, but continue using it only by the other thread. Will it eventually get migrated to the other partition? My guess is no, simply because it's too much of a hassle to detect and organize.

huangapple
  • 本文由 发表于 2023年4月4日 17:40:06
  • 转载请务必保留本文链接:https://go.coder-hub.com/75927834.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定