2023年4月4日 17:40:06go评论67阅读模式

英文:

Does Intel Cache Allocation Technology allow hits from CPUs in one group on cache lines in another group?

问题

根据MESI协议，如果需要将一个缓存行加载到缓存中，CPU将发出PrRd指令。根据缓存行是否已经存在于另一个缓存中，将会发出BusRd指令。其他缓存将随后看到BusRd指令，并检查它们是否拥有有效的副本。如果是的话，该缓存将发送该值。

现在，英特尔CAT（Cache Allocation Technology）提供了一种将LLC缓存使用隔离到不同CPU的方式。例如，CPU1使用前8路，CPU2使用接下来的8路。我的问题是：如果现在CPU1需要加载CPU2缓存中存在的缓存行，CPU2是否会发送该副本而不是从主存加载？

英文:

In MESI protocol, if a cacheline needs to be loaded into a cache, the CPU will issue a PrRd. Depending whether the cacheline is already in another cache, a BusRd is issued. And other caches will then see the BusRd and check if they have a valid copy. If yes, this cache will send value.

Now intel CAT (Cache Allocation Technology) provides a way to isolate LLC cache usage to different CPUs. For example, CPU1 uses the first 8 ways and CPU2 uses the next 8 ways. My question is: if now CPU1 needs to load a cacheline which is in CPU2's cache, will CPU2 send that copy instead of loading from main memory?

答案1

得分: 2

是的。CAT不是NUMA的一种形式，地址空间仍然是共享的。这只是一种微架构特性，帮助您控制缓存占用，以便线程之间的干扰较少（或者根据您分配掩码的方式获得更多的缓存机会）。

如果您不从另一个线程分区返回数据，您将失去一致性（如果该行已被修改会怎样？在这种情况下，无法从内存返回陈旧的数据）。可以这样考虑 - 每个线程可以从整个缓存中查找，但只能分配给他的分区（这可以通过修改LRU和受害者选择来轻松实现）。这样，您可以完全控制私有行，只有共享行会放在首次访问它们的线程的分区中。足够接近以获得所需的QoS。

一个尚未解决的实施问题可能是 - 如果您在一个分区中分配一行，但只能由另一个线程继续使用它，会发生什么。它是否会最终迁移到另一个分区？我猜测不会，仅仅因为检测和组织它太过麻烦。

英文:

Yes. CAT is not a form of NUMA, the address space is still shared.
It's just a micro-architectural feature that helps you control the cache occupancy so threads will have less interferences with each other (or have access to more caching opportunities, depending on how you allocate the masks).

If you don't return data from the other thread partition you will lose your coherence (what if the line is modified? you can't return stale data from the memory in that case).
Think of it like this - each thread can lookup from the entire cache, but allocate only to his partition (this could easily be implemented by hacking the LRU and victim selection).
This way you get full control over private lines, and only shared lines will be placed in the partition of whichever thread accessed them first. Close enough to get what QoS was needed for.

One open implementation question could be this - what happens what you allocate a line in one partition, but continue using it only by the other thread. Will it eventually get migrated to the other partition? My guess is no, simply because it's too much of a hassle to detect and organize.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Does Intel Cache Allocation Technology allow hits from CPUs in one group on cache lines in another group?

问题

答案1

Memory Barrier Vs CAS

使用汇编语言来确定当前存在哪种类型的 CPU。

计算机如何区分类型？

在不同线程之间更新原子变量的延迟反映

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论