英文:
CPU Cache Behaviour when Data Changes
问题
CPU缓存如何受数据变化影响,我只是在思考。
假设我有以下C代码:
int main() {
int arr[16] = ...
for (int i = 1; i < 16; i++) {
arr[i] = arr[i] + arr[i-1];
}
for (int i = 0; i < 16; i++) {
arr[i] += arr[i];
}
}
因为每个循环中的内存写入,CPU需要多少次重新加载缓存中的数字?
英文:
I was just wondering about how data changes affect the CPU cache.
Let's say I have the following C code:
int main() {
int arr[16] = ...
for (int i = 1; i < 16; i++) {
arr[i] = arr[i] + arr[i-1];
}
for (int i = 0; i < 16; i++) {
arr[i] += arr[i];
}
}
How many times does the CPU have to reload the numbers in cache because of the memory writes in each of the loops?
答案1
得分: 4
确切的答案取决于缓存配置的机器特定细节。要确定一般情况下的确切情况,唯一的方法是使用硬件计数器和类似PAPI的工具进行测量。
然而一般而言,来自核心的写入将更新L1缓存中的副本,因此稍后对相同地址的读取将从缓存中返回更新后的副本,而无需缺失(假设缓存行在间隔内未被驱逐)。
对于您展示的代码(具有16个4字节元素的1维数组),您只处理64字节,这在大多数现代处理器上是1个缓存行(或2个,取决于对齐方式),因此在初始化元素时很可能在启动时加载到L1缓存中,并在两个循环中进行缓存内操作(假设没有来自其他线程的其他冲突访问)。
英文:
The exact answer depends on the machine-specific details of the cache configuration. The only way to know for sure in general is to measure using the hardware counters and a tool like PAPI.
However in general, writes from a core will update a copy in the L1 cache, so that a subsequent read of the same address later will return the updated copy from cache without a miss (assuming the cache line hasn't been evicted in the interval).
For the code you show (1-d array with 16 4-byte elements), you're only dealing with 64 bytes which is 1 cache line on most modern processors (or 2 depending on alignment), so it's very likely to be loaded into L1 cache at the start when you initialize the elements, and operate in-cache for both loops (assuming there are no other conflicting accesses from other threads).
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论