acq_rel操作是否可以分成获取(acquire)操作和释放(release)操作?

huangapple go评论134阅读模式
英文:

Can an acq_rel operation be split into an acquire and a release operation?

问题

上述语句是否与以下任何一种完全等价?

1)

foo.exchange(bar, std::memory_order_acquire);
dummy.store(0, std::memory_order_release);

2)

dummy.store(0, std::memory_order_release);
foo.exchange(bar, std::memory_order_acquire);

3)

foo.exchange(bar, std::memory_order_release);
dummy.load(std::memory_order_acquire);

4)

dummy.load(std::memory_order_acquire);
foo.exchange(bar, std::memory_order_release);

如果它们不等价,请说明为什么它们不等价。

英文:

Consider this C++ statement:

foo.exchange(bar, std::memory_order_acq_rel);

Can the above statement is exactly equivalent to any of the below?

1)

foo.exchange(bar, std::memory_order_acquire);
dummy.store(0, std::memory_order_release);

2)

dummy.store(0, std::memory_order_release);
foo.exchange(bar, std::memory_order_acquire);

3)

foo.exchange(bar, std::memory_order_release);
dummy.load(std::memory_order_acquire);

4)

dummy.load(std::memory_order_acquire);
foo.exchange(bar, std::memory_order_release);

In case they are not equivalent, please mention why they are not.

答案1

得分: 3

  1. 和 2) 不会,加载 foo 的其他线程不会与另一个线程中的 foo.exchange(acquire) 同步,因为它仅是一个 acquire 操作,而不是 release 操作。因此,其他线程不能安全地读取交换之前的非原子赋值的值,或获得更早的原子存储的保证值。

  2. 和 4) 在与另一个写入或读取器同步(或不同步)以创建happens-before关系方面存在各种问题。只有当一个线程对另一个线程的release-store的值执行acquire-load时,这种情况才会发生。如果exchange的存储端是relaxed,那就不会发生这种情况。

我不知道你是否认为dummy.store(0, std::memory_order_release); 就像atomic_thread_fence(release)一样是一个双向屏障,但实际上它不是,它只是一个在没有其他线程访问的虚拟变量上的release操作(我假设没有其他线程访问)。

请参阅https://preshing.com/20120913/acquire-and-release-semantics/,以了解有关访问一致共享内存的本地重排序的描述。Acquire和release操作可以在各个方向上重排序。虚拟release存储可以与除了自身是release或更强的操作之外的任何后续操作重排序,因此它可能根本不存在。

大致等效(我认为严格更强)的方式如下:

// 任何较早的操作无法重排序到屏障之前
std::atomic_thread_fence(std::memory_order_release);
// 较晚的存储无法重排序到屏障之前
foo.exchange(bar, std::memory_order_acquire);  // 因此这个存储位于较早操作之后

exchange的加载部分仍然可以与其他对象上的较早加载/存储重排序,因此它并不太强。 (相关:https://stackoverflow.com/questions/65568185/for-purposes-of-ordering-is-atomic-read-modify-write-one-operation-or-two)

另一个可行的方式是foo.exchange(bar, release);thread_fence(acquire)

另一个答案建议foo.exchange(bar, release);foo.load(acquire)是等效的,但实际上并不是。acquire加载可能与交换操作看到的值的不同线程进行同步。

如果您真的不使用exchange的返回值来检查是否应该执行某些操作(例如if(sequence_num > x)),或者确定应该访问的内容或位置(例如指针或数组索引),那么它的acquire语义可能根本不重要。

但是,如果我们考虑一个类似int idx = foo.exchange(bar, acq_rel); int tmp = arr[idx]; 的读取器,将acq_rel交换替换为int idx = foo.exchange(bar, release)foo.load(acquire)(忽略该acquire加载的值)不会等效。只有一个acquire barrier(屏障)会对交换的加载部分与稍后的操作进行排序。

如果第三个线程在exchange(release)load(acquire)之间的时间内变得可见,您不会与存储了exchange看到的值的线程进行同步,只会与存储了您忽略的值的第三个线程进行同步。

考虑一个执行arr[i] = 123; foo.store(i, release);的写入器。如果第三个线程执行了foo.store(0, relaxed)或其他操作,那么foo.load(acquire)将与它同步,而不是写入arr[idx]的线程。这当然是一个刻意构造的示例,依赖排序将在实际的CPU上为您节省时间,即使foo.exchange的加载部分是relaxed而不是consume。但是ISO C++在这种情况下并没有正式保证任何内容。(而且根据exchange的结果进行分支,而不是将其用作加载或存储地址的一部分,不能让依赖排序为您节省时间)。

如果第三个线程也在使用exchange(即使是relaxed),那将创建一个release-sequence,因此您的加载将与先前写入器同步。但是纯存储不保证这一点,破坏了release-sequence。

在大多数CPU上,存储只能通过将数据提交到一致性缓存来对其他线程可见,就像对原子RMW一样。因此,普通存储也可以继续release-sequence,使acquire加载与对象的所有先前release存储和RMWs同步。但是ISO C++在这方面并没有正式保证,我不敢确定在PowerPC上是否安全,其中存储在逻辑核心之间进行传递是一种可能。但是在PPC上,acquire加载使用asm屏障执行,这也会加强交换的加载部分。

总之,如果您尝试理解C++的正式规范,重要的是要理解您实际使用的值的加载必须是acquire,或者必须存在一个acquire fence(而不仅仅是操作)。

英文:

For 1) and 2) no, some other thread that loads foo won't sync-with foo.exchange(acquire) in another thread, because it's only an acquire, not a release operation. So that other thread won't safely be able to read the values of non-atomic assignments from before the exchange, or get guaranteed values for earlier atomic stores.

The 3) and 4) have various problems in terms of (not) syncing with another writer or reader to create a happens-before relationship. That only happens when one thread does an acquire-load on the value from a release-store in another thread. If the store side of the exchange is relaxed, that doesn't happen.

IDK if you're thinking of dummy.store(0, std::memory_order_release); as being a 2-way barrier like atomic_thread_fence(release) but it's not, it's just a release operation, on a dummy variable that no other thread ever accesses (I assume.)

See https://preshing.com/20120913/acquire-and-release-semantics/ for a description in terms of local reordering of accesses to coherent shared memory. Acquire and release operations can reorder in one direction each. The dummy release store can reorder with any later operations except ones that are themselves release or stronger, so it might as well not exist.

What would be approximately equivalent (strictly stronger I think) is:

  // Any earlier operations can't reorder past the fence
std::atomic_thread_fence(std::memory_order_release);
  // and later stores can't reorder before the fence
foo.exchange(bar, std::memory_order_acquire);  // so this store is after any earlier ops

The load part of the exchange can still reorder with earlier loads/stores on other objects so it's not much stronger. (related: https://stackoverflow.com/questions/65568185/for-purposes-of-ordering-is-atomic-read-modify-write-one-operation-or-two)


Also fine would be foo.exchange(bar, release) ; thread_fence(acquire).

Another answer suggests foo.exchange(bar, release) ; foo.load(acquire) would be equivalent, but it's not. The acquire load might sync-with a different thread than the one whose value the exchange saw.

If you're really not using the return value of exchange to either check if you should do something (if(sequence_num > x)), or figure out what or where you should access (e.g. a pointer or array index), the acquire semantics of it is unlikely to matter at all.

But if we consider a reader like int idx = foo.exchange(bar, acq_rel); int tmp = arr[idx];, replacing the acq_rel exchange with int idx = foo.exchange(bar, release) ; foo.load(acquire) (ignoring the value of that acquire load) wouldn't be equivalent. Only an acquire barrier (fence) would order the load side of the exchange wrt. later operations.

If a store from a third thread becomes visible between the exchange(release) and load(acquire), you don't sync-with the thread that stored the value your exchange saw, only the third thread that stored the value you're ignoring.

Consider a writer that did arr[i] = 123; foo.store(i, release);
If a third thread did foo.store(0, relaxed); or whatever, the foo.load(acquire) would sync with it, not the one that wrote arr[idx]. This is of course a contrived example, and dependency ordering would save you on real CPUs even though the load side of foo.exchange was relaxed not consume. But ISO C++ formally guarantees nothing in that case. (And branching on the exchange result instead of using it as part of a load or maybe store address wouldn't let dependency ordering save you.)

If the third thread was also using exchange (even relaxed), that would create a release-sequence so your load would still sync-with the earlier writer as well. But a pure store doesn't guarantee that, breaking a release-sequence.

On most CPUs, where stores can only become visible to other threads by committing to coherent cache, the writer had to wait for exclusive ownership of the cache line just like for an atomic RMW. So plain stores can also continue a release-sequence, letting an acquire load sync-with all previous release stores and RMWs to the object. But ISO C++ doesn't formally guarantee that, and I wouldn't bet on it being safe on PowerPC where store-forwarding between logical cores is a thing. Except that on PPC, an acquire load is done with asm barriers, which would also strengthen the load part of an exchange.

Still, if you're trying to understand the C++ formalism, it's important to understand that the load who's value you actually use needs to be acquire, or there needs to be an acquire fence (not just operation).

答案2

得分: 2

虽然C++内存模型没有描述关于重排序的获取/释放语义,但它仍然是一个相当不错的近似。获取操作可以与先前的操作重排序,但不能与后续的操作重排序;释放操作则相反。

从视觉上来看,可以尝试将其类比成桌子上的纸牌或类似的东西。每张纸牌代表一个加载/存储/原子操作,它们按程序顺序排列。然后规则是,你可以交换任何两张相邻的纸牌,除非左边的那张是获取,或右边的那张是释放,或两者都是。

在下文中,设X为你的foo.exchange,我们将根据它是获取还是释放来标记为XA或XR。DA/DR是虚拟的获取加载或释放存储。P是任何在X和D之前顺序排列的轻松或非原子操作,Q是在它们之后顺序排列的另一个操作。

在原始版本中,我们从简单的P XAR Q开始。由于X既是获取又是释放,它不能与P或Q交换。(可能可以在X内的加载和存储之间重新排序P或Q,但这在这里并不重要。)因此,如果在某个替代代码中有任何方法将P或Q移动到X的相反侧,那么它与原始代码不等效。

在第1种情况下,很容易。你开始于P XA DR B,但P和XA可以立即交换,因为XA只是获取。

在第2种情况下,需要更多步骤。你从P DR XA Q开始,无法交换P和DR,也无法交换XA和Q。但你可以交换DR和XA,然后再交换P和XA。

P DR XA Q
P XA DR Q
XA P DR Q

我将第3和第4种情况作为练习留给你,因为它们有类似的解决方案。

英文:

Although the C++ memory model does not describe acquire/release semantics in terms of reordering, it's still a pretty good approximation. Acquire operations can be reordered with earlier operations, but not with later; release is the other way around.

It can be helpful visually to try it with cards on a table or something like that. Each card is a load/store/RMW operation, and you start with them in program order. Then the rule is that you may swap any two adjacent cards unless the left one is acquire, or the right one is release, or both.

In what's below, let X be your foo.exchange, which we will decorate as XA or XR according to whether it is acquire or release. Let DA/DR be the dummy acquire-load or release-store. Let P be any relaxed or non-atomic operation that is sequenced before both X and D, and Q another one that is sequenced after.

In the original version, we begin with simply P XAR Q. Since X is both acquire and release, it cannot be swapped with either P or Q. (It is possible for either P or Q to be reordered between the load and store within X, but that's not really relevant here.) So if in some replacement code there is any way to move either P or Q to the opposite side of X, then it is not equivalent to the original.

In #1 it is easy. You start with P XA DR B, but P and XA can be immediately swapped because XA is only acquire.

In #2 it takes a little more. You start with P DR XA Q, and you cannot swap P with DR, nor XA with Q. But you can swap DR with XA, and then P with XA.

P DR XA Q
P XA DR Q
XA P DR Q

I leave #3 and #4 as exercises, as they have similar solutions.

答案3

得分: 1

操作完全不同,有一个简单的原因。对变量 a 的释放操作在任何情况下都不等同于对变量 b 的释放操作。要与线程同步,需要调用变量 b 上的 acquire 而不是 a。这就是区别。是的,内存指令与变量相关。

因此,将 acq_rel 替换为 foo 上的较低指令和 dummy 上的指令将无法正确与调用 foo 上的 acquirerelease 的线程同步,具体取决于在 foo 上调用了什么指令。

然而,如果在 foo 上调用了一个被丢弃的 load 操作,以及与补充指令一起的 exchange,效果将几乎等同。此外,您还可以调用一个通用的 fence,触发更强的同步指令。

英文:

The operations are completely different for a simple reason. Release operation on variable a is not equivalent in any way to release operation on variable b. To synchronize with the thread one would need to call acquire on variable b rather than a. That's the difference. Yes, the memory instruction are tied to variables.

So replacing acq_rel with lesser instruction on foo and an instruction on dummy will not properly synchornize with threads that call either acquire or release on foo depending on what instruction was called on foo.

Albeit if you called a discarded load on foo in addition to the exchange with the complememting instruction, the effect would be pretty much equivalent. Also you could call a general fence that would trigger a stronger synchronization instruction.

huangapple
  • 本文由 发表于 2023年3月7日 17:07:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/75659907.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定