2023年1月9日 00:26:42go评论92阅读模式

英文:

CPU Cache Behaviour when Data Changes

问题

CPU缓存如何受数据变化影响，我只是在思考。

假设我有以下C代码：

int main() {
  int arr[16] = ...
  for (int i = 1; i &lt; 16; i++) {
    arr[i] = arr[i] + arr[i-1];
  }
  for (int i = 0; i &lt; 16; i++) {
   arr[i] += arr[i];
  }
}

因为每个循环中的内存写入，CPU需要多少次重新加载缓存中的数字？

英文:

I was just wondering about how data changes affect the CPU cache.

Let's say I have the following C code:

int main() {
  int arr[16] = ...
  for (int i = 1; i &lt; 16; i++) {
    arr[i] = arr[i] + arr[i-1];
  }
  for (int i = 0; i &lt; 16; i++) {
   arr[i] += arr[i];
  }
}

How many times does the CPU have to reload the numbers in cache because of the memory writes in each of the loops?

答案1

得分: 4

确切的答案取决于缓存配置的机器特定细节。要确定一般情况下的确切情况，唯一的方法是使用硬件计数器和类似PAPI的工具进行测量。

然而一般而言，来自核心的写入将更新L1缓存中的副本，因此稍后对相同地址的读取将从缓存中返回更新后的副本，而无需缺失（假设缓存行在间隔内未被驱逐）。

对于您展示的代码（具有16个4字节元素的1维数组），您只处理64字节，这在大多数现代处理器上是1个缓存行（或2个，取决于对齐方式），因此在初始化元素时很可能在启动时加载到L1缓存中，并在两个循环中进行缓存内操作（假设没有来自其他线程的其他冲突访问）。

英文:

The exact answer depends on the machine-specific details of the cache configuration. The only way to know for sure in general is to measure using the hardware counters and a tool like PAPI.

However in general, writes from a core will update a copy in the L1 cache, so that a subsequent read of the same address later will return the updated copy from cache without a miss (assuming the cache line hasn't been evicted in the interval).

For the code you show (1-d array with 16 4-byte elements), you're only dealing with 64 bytes which is 1 cache line on most modern processors (or 2 depending on alignment), so it's very likely to be loaded into L1 cache at the start when you initialize the elements, and operate in-cache for both loops (assuming there are no other conflicting accesses from other threads).

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

CPU缓存在数据更改时的行为

问题

答案1

代码在使用char时，在GCC中输出意外/错误的结果。

Is it possible in CMake to set properties of tests in a different directory scope than the one the test is defined in?

使用for循环来找到给定数字区间中数字的最小数字和

VS Code启动：程序不存在

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。