Goroutines和C线程之间的原子栅栏并发 – 语义是什么?

huangapple go评论73阅读模式

Atomic fence concurrency between Goroutines and C threads - what are the semantics?








package main

#include <pthread.h>
#include <malloc.h>

typedef struct {
   int fence_0;
   char *data;
} shared_data;

shared_data *make_shared_data() {
   shared_data *sd = calloc(sizeof(shared_data), 1);
   sd->data = calloc(1024,1);
   sd->data[0] = 17;
   return sd;

void *get_shared_data_ptr(shared_data *sd) {
   return sd->data;

int read_data_in_pthread(shared_data *sd) {
   int l;
   __atomic_load(&sd->fence_0, &l, __ATOMIC_ACQUIRE);
   if (l < 2) return 0;
   return sd->data[0] + sd->data[1023]; 

import "C"
import (

func main() {

   // Prevent thread/cache switching (to avoid asking a third, unimportant question and allow the below "naughty")

   // Allocate a C-owned structure.
   csd := C.make_shared_data()

   // This is just an expedient for the sake of this example, I'm aware it's naughty/bad, etc.
   ptr := (*byte)(C.get_shared_data_ptr(csd))
   arrptr := &reflect.SliceHeader{Data: uintptr(unsafe.Pointer(ptr)), Len: 1024, Cap: 1024}
   arr := *(*[]byte)(unsafe.Pointer(arrptr))

   fmt.Printf("%d\n", arr[0])
   done := make(chan bool)

   // Repeatedly execute a reader function in a cgo thread which will output zero if first fence is not 2
   // and output the sum of the first and last data points if it is.
   go func(){
         var s uint8
         s = 0
         for s == 0 {
            s = uint8(C.read_data_in_pthread(csd))
         fmt.Printf("finished: %d\n", s)
         done <- true

   go func(){
         atomic.StoreInt32((*int32)(&csd.fence_0), 1)
         for i := 0; i < 1024; i++ {
            arr[i] = 255
         atomic.StoreInt32((*int32)(&csd.fence_0), 2)


问题是:(a) 这个程序的输出是否可能是17? (b) 如果不是,这个程序的输出是否总是254,还是可能是255





I'm wondering if it is possible to coordinate atomic operation concurrency between goroutines and C threads explicitly.

The use case here involves an audio processing library in C, which creates an OS thread, and periodically calls a user-supplied callback to retrieve audio data. This must happen in almost real-time, so I don't want to incur the overhead of cgo calls, stack swaps, and Go-land concurrency. A ring buffer can solve this problem in general, where one thread writes to the buffer, another reads, and synchronization is performed with memory fences.

However, it appears that currently the memory semantics of atomic operations in Go is left completely undefined in the docs, and therefore utterly useless for this purpose, and probably many others.... (https://golang.org/pkg/sync/atomic/ unhelpfully just says "atomic", see https://github.com/golang/go/issues/5045)

But - it has to work in some way, even if that's not documented. How?

PLEASE NOTE I am not asking about solutions to the problem I describe, however. I am not asking if ring buffers are the correct choice, or if I should "communicate by sharing" or whatever. I am asking after the currently implemented memory order semantics of atomic operations in Go (say, the latest release version - 1.16.5 for concreteness).

In particular, here is a sample program which sets up a similar situation to what occurs in my actual use case:

package main
#include &lt;pthread.h&gt;
#include &lt;malloc.h&gt;
typedef struct {
int fence_0;
char *data;
} shared_data;
shared_data *make_shared_data() {
shared_data *sd = calloc(sizeof(shared_data), 1);
sd-&gt;data = calloc(1024,1);
sd-&gt;data[0] = 17;
return sd;
void *get_shared_data_ptr(shared_data *sd) {
return sd-&gt;data;
int read_data_in_pthread(shared_data *sd) {
int l;
__atomic_load(&amp;sd-&gt;fence_0, &amp;l, __ATOMIC_ACQUIRE);
if (l &lt; 2) return 0;
return sd-&gt;data[0] + sd-&gt;data[1023]; 
import &quot;C&quot;
import (
func main() {
// Prevent thread/cache switching (to avoid asking a third, unimportant question and allow the below &quot;naughty&quot;)
// Allocate a C-owned structure.
csd := C.make_shared_data()
// This is just an expedient for the sake of this example, I&#39;m aware it&#39;s naughty/bad, etc.
ptr := (*byte)(C.get_shared_data_ptr(csd))
arrptr := &amp;reflect.SliceHeader{Data: uintptr(unsafe.Pointer(ptr)), Len: 1024, Cap: 1024}
arr := *(*[]byte)(unsafe.Pointer(arrptr))
fmt.Printf(&quot;%d\n&quot;, arr[0])
done := make(chan bool)
// Repeatedly execute a reader function in a cgo thread which will output zero if first fence is not 2
// and output the sum of the first and last data points if it is.
go func(){
var s uint8
s = 0
for s == 0 {
s = uint8(C.read_data_in_pthread(csd))
fmt.Printf(&quot;finished: %d\n&quot;, s)
done &lt;- true
go func(){
atomic.StoreInt32((*int32)(&amp;csd.fence_0), 1)
for i := 0; i &lt; 1024; i++ {
arr[i] = 255
atomic.StoreInt32((*int32)(&amp;csd.fence_0), 2)

The question is: (a) Can the output of this program ever be 17? (b) IF not, must the output of this program always be 254, or might it be 255?

If the Go atomic stores work with a memory model similar to gcc's ATOMIC_SEQ_CST, the memory fence is sequential, and we'll always see 254. This would seem to be a sensible default. But, is it necessarily true?

If not, my program will be non-portable and produce errors. So, I'd like to know for sure.

(Yes, I know the test case above is definitely entirely non-portable / only runs on GNU/Linux... the actual library in question is in fact portable.)


得分: 2








There's a sort of impedance mismatch, as it were, between the Go memory model and the (multiple) memory models available in C and C++ (see cppreference.com on C memory order options, and note that C++ has a more nuanced view than C11 did, beginning in C++20). This can, at least in theory, make for some big headaches for implementors: calls in and out of C code, via cgo, might need to do heavy-duty CPU sync if, e.g., the Go system uses some sort of total or partial store order model and the C system uses a relaxed memory model.

In practice, each implementation will strive to use the same kinds of synchronizations for atomic-load-32 and atomic-store-32, for instance. But:

> The use case here involves an audio processing library in C, which creates an OS thread, and periodically calls a user-supplied callback to retrieve audio data. This must happen in almost real-time, so I don't want to incur the overhead of cgo calls, stack swaps, and Go-land concurrency. A ring buffer can solve this problem in general, where one thread writes to the buffer, another reads, and synchronization is performed with memory fences.
> [snip]
> But - it has to work in some way, even if that's not documented. How?

You're going to have to look at each implementation, one at a time, because the "how" could—at least potentially—be different each time. So find out what your systems use on their PowerPC implementations, find out what your systems use on their ARM implementations, and so on. You'll want to have your low level Go routines be implementation-specific, chosen to work with your low-level C routines.


得分: 1



The language itself doesn't define any atomic operations. The sync/atomic package, however, does. The issue you link is prefixed "doc:", meaning that they're only debating how to improve the documentation surrounding atomic's interaction with the Go memory model. The package still works. The operations in it are atomic as described. Any known exceptions are listed in the "Bugs" section: https://golang.org/pkg/sync/atomic/#pkg-note-BUG

  • 本文由 发表于 2021年6月12日 01:12:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/67941030.html



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
