英文:
Read optimizations on shared memory
问题
假设您有一个函数,该函数对一个共享变量进行多次读取,该访问是原子的。所有操作都在同一个进程中运行。将它们想象成进程的线程或者在没有内存管理单元(MMU)的裸机平台上运行的软件。
作为要求,您必须确保在函数的整个执行过程中,读取的值对于所有线程都是一致的,因此代码不能重新读取内存位置,并且必须将其放在本地变量或寄存器中。我们如何确保这种行为得到遵守呢?
举个例子...
shared是唯一的共享变量
extern uint32_t a, b, shared;
void useless_function()
{
__ASM volatile ("":::"memory");
uint32_t value = shared;
a = value * 2;
b = value << 3;
}
在某些情况下,value是否可以被直接从shared变量中读取并进行优化?如果可以,我如何确保这不会发生?
(Note: I did not translate the code part as per your request.)
英文:
Suppose you have a function that make several read access to a shared variable whose access is atomic. All in running in the same process. Imagine them as threads of a process or as a sw running on bare metal platform with no MMU.
As a requirement you must ensure that the value of that read is consistent for all the length of the function so the code must not re-read the memory location and have to put in a local variable or on a register. How can we ensure that this behaviour is respected?
As an example...
> shared is the only shared variable
extern uint32_t a, b, shared;
void useless_function()
{
__ASM volatile ("":::"memory");
uint32_t value = shared;
a = value *2;
b = value << 3;
}
Can value be optimized out by direct readings of shared variable in some contexts? If yes, how can I be sure this cannot happen?
答案1
得分: 1
作为要求,您必须确保对于函数的整个长度,所读取的值保持一致,因此代码不得重新读取内存位置,而必须将其放在本地变量或寄存器中。我们如何确保这种行为得到尊重?
您可以使用Linux内核中的READ_ONCE
宏来实现:
/*
* 防止编译器合并或重新获取读取或写入操作。编译器还被禁止重新排序连续的 READ_ONCE 和 WRITE_ONCE 实例,
* 但只有在编译器知道某种特定的顺序时才会执行此操作。使编译器了解顺序的一种方法是将两次调用 READ_ONCE 或 WRITE_ONCE
* 放入不同的 C 语句中。
*
* 这两个宏也适用于结构体或联合等聚合数据类型。如果访问的数据类型的大小超过机器的字大小(例如,32位或64位),
* READ_ONCE() 和 WRITE_ONCE() 将回退到 memcpy()。至少有两个 memcpy():一个用于 __builtin_memcpy(),
* 然后一个用于在堆栈上分配的变量 '__u' 的复制的宏。
*
* 它们的两个主要用途是:(1) 在同一CPU上运行的进程级代码和irq/NMI处理程序之间进行通信,(2) 确保编译器不会折叠、
* 放样或以其他方式损坏不需要顺序或与提供所需顺序的显式内存屏障或原子指令交互的访问。
*/
例如:
uint32_t value = READ_ONCE(shared);
READ_ONCE
宏本质上将您读取的对象强制转换为volatile
,因为编译器无法为volatile
对象生成额外的读取或写入操作。
上述代码等效于:
uint32_t value = *(uint32_t volatile*)&shared;
或者:
uint32_t value;
memcpy(&value, &shared, sizeof value);
memcpy
打破了shared
和value
之间的依赖关系,因此编译器不能重新加载shared
而不是加载value
。
英文:
> As a requirement you must ensure that the value of that read is consistent for all the length of the function so the code must not re-read the memory location and have to put in a local variable or on a register. How can we ensure that this behaviour is respected?
You can do that with READ_ONCE
macro from Linux kernel:
/*
* Prevent the compiler from merging or refetching reads or writes. The
* compiler is also forbidden from reordering successive instances of
* READ_ONCE and WRITE_ONCE, but only when the compiler is aware of some
* particular ordering. One way to make the compiler aware of ordering is to
* put the two invocations of READ_ONCE or WRITE_ONCE in different C
* statements.
*
* These two macros will also work on aggregate data types like structs or
* unions. If the size of the accessed data type exceeds the word size of
* the machine (e.g., 32 bits or 64 bits) READ_ONCE() and WRITE_ONCE() will
* fall back to memcpy(). There's at least two memcpy()s: one for the
* __builtin_memcpy() and then one for the macro doing the copy of variable
* - '__u' allocated on the stack.
*
* Their two major use cases are: (1) Mediating communication between
* process-level code and irq/NMI handlers, all running on the same CPU,
* and (2) Ensuring that the compiler does not fold, spindle, or otherwise
* mutilate accesses that either do not require ordering or that interact
* with an explicit memory barrier or atomic instruction that provides the
* required ordering.
*/
E.g.:
uint32_t value = READ_ONCE(shared);
READ_ONCE
macro essentially casts the object you read to be volatile
because the compiler cannot emit extra reads or writes for volatile
objects.
The above is equivalent to:
uint32_t value = *(uint32_t volatile*)&shared;
Alternatively:
uint32_t value;
memcpy(&value, &shared, sizeof value);
memcpy
breaks the dependency between shared
and value
, so that the compiler cannot re-load shared
instead of loading value
.
答案2
得分: 0
在给定的示例中,您在函数中根本没有使用变量 value
。因此,它肯定会被优化。
此外,正如在注释中提到的,在多任务系统中,shared
的值可以在函数内部被更改。
我建议像下面这样做:
extern uint32_t a, b, shared;
void useless_function()
{
__ASM volatile (":::"memory");
uint32_t value = shared;
a = value*2;
b = value << 3;
}
在这里,shared
只在函数中读取一次。它将在下一次调用函数时重新读取。
英文:
In the example given you are not using the variable value
in the function at all. So it will definitely be optimised.
Also, as mentioned in comments, in a multitasking system, the value of shared
can be changed within the function.
> What I need is that shared is read only once and it local value keeped for all function length and not re-evaluated
I would suggest something like this below.
extern uint32_t a, b, shared;
void useless_function()
{
__ASM volatile ("":::"memory");
uint32_t value = shared;
a = value*2;
b = value << 3;
}
Here shared
is read only once in the function. It will be read again on next call of the function.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论