英文:
Does this small C# template make (almost) any data thread safe?
问题
以下是代码部分的中文翻译:
// 使用情况如下:
// - **一个**线程使用 SetData() 写入数据
// - **多个**线程可以使用 GetData() 读取数据
// - 我计划只使用基本数据类型或结构体作为 T
// 根据这些规则,这个实现是线程安全的吗?
public struct ThreadSafeData<T>
{
private T[] dataArray;
private int setterIndex;
private int lastSetIndex;
public void Init()
{
dataArray = new T[2];
}
public void SetData(T data)
{
dataArray[setterIndex] = data;
lastSetIndex = setterIndex;
// 将 0 转换为 1,将 1 转换为 0
setterIndex = lastSetIndex * -1 + 1;
}
public T GetData()
{
return dataArray[lastSetIndex];
}
}
希望这个翻译对你有所帮助。
英文:
The use case is this:
- One thread writes data, using SetData()
- Multiple threads may read data, using GetData()
- I plan to only use basic data types or structs as T
With these rules in mind, is this thread safe?
public struct ThreadSafeData<T>
{
private T[] dataArray;
private int setterIndex;
private int lastSetIndex;
public void Init()
{
dataArray = new T[2];
}
public void SetData(T data)
{
dataArray[setterIndex] = data;
lastSetIndex = setterIndex;
// Convert 0 to 1, and 1 to 0
setterIndex = lastSetIndex * - 1 + 1;
}
public T GetData()
{
return dataArray[lastSetIndex];
}
}
UPDATE (requested in comments)
What I would like to achieve is the following; I want to avoid tearing and want the reader to always read the last value written by the writer. I tried doing that with a single T field, but then I encountered (what I think is) tearing. For example, in the tests (see below) I always write a Vector2Int with 0,0 or 1,1. But the reader would sometimes read 1,0 when using a single T field. This is why I added the Array (and added the "data integrity" check to my tests).
I am using X64 architecture. And this is the Vector2Int I use in my tests:
https://docs.unity3d.com/ScriptReference/Vector2Int.html
Questions
-
How do I know if this is thread safe (if it is)? I have run tests for quite a while. But how do I know for sure?
-
Do you know a better solution for this use case? Please let me know!
Tests
I am making a game in Unity and have run tests where the "writing thread" runs at 30, 60 or 90fps, and doing up to 300 writes per frame. And a "reading thread" running from 30 to 300fps (doing 1 read per frame).
The test data (T) I used was a struct with a Vector2Int and a bool. To check the data integrity, the "reader" checked if the x and y of the Vector2Int are 1 when the bool is true and when false, x and y have to be 0 (it throws an error when this is wrong).
I ran these test for about an hour and never got any errors. But I am not sure if that means that this always works correct.
(ps. I don't really care if this template is a struct or class; I am not sure yet what will work best for me)
答案1
得分: 3
I managed to reproduce a torn value with your ThreadSafeData
implementation, using as T
the type ValueTuple<int, int, int, int, int, int, int, int, int>
, and mutating it using the same Random.Next()
value for all nine fields of the tuple. Demo. Hundreds of torn T
values per second are observed. Like this value:
(373331022, 373331022, 373331022, 373331022, 373331022, 373331022, 373331022, 1480972221, 1480972221)
Here are some alternative implementations of the ThreadSafeData<T>
struct, that are truly safe. Using the lock
statement is definitely thread-safe, and also very simple:
public struct ThreadSafeData<T>
{
private readonly object _locker = new();
private T _data = default;
public ThreadSafeData() { }
public void SetData(T data) { lock (_locker) _data = data; }
public T GetData() { lock (_locker) return __data; }
}
The cost of an uncontended lock
is in the magnitude of ~20 nanoseconds, so it's quite cheap, but if your readers are calling the GetData
very frequently then you might want a faster solution. This solution is not the [ReaderWriterLockSlim
][5] though:
public struct ThreadSafeData<T>
{
private readonly ReaderWriterLockSlim _lock = new();
private T _data = default;
public ThreadSafeData() { }
public SetData(T data)
{
_lock.EnterWriteLock();
try { _data = data; }
finally { _lock.ExitWriteLock(); }
}
public T GetData()
{
_lock.EnterReadLock();
try { return _data; }
finally { _lock.ExitReadLock(); }
}
}
This is actually a little slower than the lock
, because the work that is performed by the writer is too lightweight. The ReaderWriterLockSlim
is advantageous when the writer does chunky work.
A better alternative regarding performance, but worse regarding memory allocations, is to store the T
value in a [volatile
][6] object
field:
public struct ThreadSafeData<T>
{
private volatile object _data = default(T);
public ThreadSafeData() { }
public void SetData(T data) => _data = data;
public T GetData() => (T)_data;
}
This will cause boxing in case the T
is a value type, and a new box will be allocated on each SetData
operation. According to my experiments the size of the box is 16 bytes + the size of the T
. The performance boost compared to the lock
(in scenarios with high contention), is around x10.
Seqlock: Peter Cordes mentioned in [their answer][7] the interesting idea of the [Seqlock][8]. Below is an implementation of this idea. My tests don't reveal any tearing. Most likely the implementation below is safe for a single writer - multiple readers scenario, but I wouldn't bet my life on it. The advantage of this approach over the above volatile object
, is that it doesn't allocate memory.
public struct ThreadSafeData<T> // Safe with a single writer only
{
private volatile int _seq;
private T _data;
public void SetData(T data)
{
_seq++;
Interlocked.MemoryBarrier();
_data = data;
_seq++;
}
public T GetData()
{
SpinWait spinner = default;
while (true)
{
int seq1 = _seq;
if ((seq1 & 1) != 0) goto spin;
T data = _data;
Interlocked.MemoryBarrier();
int seq2 = _seq;
if (seq1 != seq2) goto spin;
return data;
spin:
spinner.SpinOnce();
}
}
}
According to my experiments the performance of the GetData()
is about half of the corresponding volatile object
-based implementation, but still many times faster than the lock
-based, under the condition that the SetData
is called infrequently. Otherwise, if the writer calls the SetData
in a tight loop, the readers will barely be able to read any value at all. Almost always the _seq
will be different before and after reading the _data
, resulting in endless spinning.
4: https://www.albahari.com/threading/part2.aspx#_Locking_Performance "Threading in C# - Basic synchronization - Locking - Performance (Joseph Albahari)"
[5]: https://learn.microsoft.com/en-us/dotnet/api/system.threading.readerwriterlockslim
[6]: https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/volatile
[7]: https://stackoverflow.com/a/75659510/11178549
[8]: https://en.wikipedia.org/wiki/Seqlock
英文:
I managed to reproduce a torn value with your ThreadSafeData
implementation, using as T
the type ValueTuple<int, int, int, int, int, int, int, int, int>
, and mutating it using the same Random.Next()
value for all nine fields of the tuple. Demo. Hundreds of torn T
values per second are observed. Like this value:
(373331022, 373331022, 373331022, 373331022, 373331022, 373331022, 373331022, 1480972221, 1480972221)
Here are some alternative implementations of the ThreadSafeData<T>
struct, that are truly safe. Using the lock
statement is definitely thread-safe, and also very simple:
public struct ThreadSafeData<T>
{
private readonly object _locker = new();
private T _data = default;
public ThreadSafeData() { }
public void SetData(T data) { lock (_locker) _data = data; }
public T GetData() { lock (_locker) return _data; }
}
The cost of an uncontended lock
is in the magnitude of ~20 nanoseconds, so it's quite cheap, but if your readers are calling the GetData
very frequently then you might want a faster solution. This solution is not the [ReaderWriterLockSlim
][5] though:
public struct ThreadSafeData<T>
{
private readonly ReaderWriterLockSlim _lock = new();
private T _data = default;
public ThreadSafeData() { }
public void SetData(T data)
{
_lock.EnterWriteLock();
try { _data = data; }
finally { _lock.ExitWriteLock(); }
}
public T GetData()
{
_lock.EnterReadLock();
try { return _data; }
finally { _lock.ExitReadLock(); }
}
}
This is actually a little slower than the lock
, because the work that is performed by the writer is too lightweight. The ReaderWriterLockSlim
is advantageous when the writer does chunky work.
A better alternative regarding performance, but worse regarding memory allocations, is to store the T
value in a [volatile
][6] object
field:
public struct ThreadSafeData<T>
{
private volatile object _data = default(T);
public ThreadSafeData() { }
public void SetData(T data) => _data = data;
public T GetData() => (T)_data;
}
This will cause boxing in case the T
is a value type, and a new box will be allocated on each SetData
operation. According to my experiments the size of the box is 16 bytes + the size of the T
. The performance boost compared to the lock
(in scenarios with high contention), is around x10.
Seqlock: Peter Cordes mentioned in [their answer][7] the interesting idea of the [Seqlock][8]. Below is an implementation of this idea. My tests don't reveal any tearing. Most likely the implementation below is safe for a single writer - multiple readers scenario, but I wouldn't bet my life on it. The advantage of this approach over the above volatile object
, is that it doesn't allocate memory.
public struct ThreadSafeData<T> // Safe with a single writer only
{
private volatile int _seq;
private T _data;
public void SetData(T data)
{
_seq++;
Interlocked.MemoryBarrier();
_data = data;
_seq++;
}
public T GetData()
{
SpinWait spinner = default;
while (true)
{
int seq1 = _seq;
if ((seq1 & 1) != 0) goto spin;
T data = _data;
Interlocked.MemoryBarrier();
int seq2 = _seq;
if (seq1 != seq2) goto spin;
return data;
spin:
spinner.SpinOnce();
}
}
}
According to my experiments the performance of the GetData()
is about half of the corresponding volatile object
-based implementation, but still many times faster than the lock
-based, under the condition that the SetData
is called infrequently. Otherwise, if the writer calls the SetData
in a tight loop, the readers will barely be able to read any value at all. Almost always the _seq
will be different before and after reading the _data
, resulting in endless spinning.
4: https://www.albahari.com/threading/part2.aspx#_Locking_Performance "Threading in C# - Basic synchronization - Locking - Performance (Joseph Albahari)"
[5]: https://learn.microsoft.com/en-us/dotnet/api/system.threading.readerwriterlockslim
[6]: https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/volatile
[7]: https://stackoverflow.com/a/75659510/11178549
[8]: https://en.wikipedia.org/wiki/Seqlock
答案2
得分: 2
2个int
元素仅占64位;主流的x86和ARM处理器可以执行64位原子存储和加载操作(即使在32位模式下运行),因此最佳方法是说服编译器执行这种操作(例如,使用C++的std::atomic<my_struct>
和memory_order_relaxed
,或者在C#中,将两个成员打包到uint64_t
中,如果您可以获得代码的必要可见性保证以及原子性)。
请记住,普通变量不保证在线程之间的可见性,除非有其他同步操作:在循环中读取变量可以优化为将其读取到寄存器中一次,然后每次使用该值。每当发生加载或存储操作时,它都是原子的,但没有使用volatile
或Volatile.Read
/ .Write
,没有保证加载/存储操作与源代码中发生的频率一样频繁。
一般来说,您可能正在寻找SeqLock,在这种情况下,读取器会重试,直到获得负载+序列号的非撕裂读取。请参阅以下链接:
- Stack Overflow讨论,包括C++示例
- SeqLock或RCU
- C++中写入任意大小数据向量/数组的原子无锁写入
- 在没有锁的情况下在特定时间获得多个值的稳定版本(这是使用类似于SeqLock的策略的C# ConcurrentQueue,通过重读成员直到获得一致的读取,但有一些限制。不清楚
spin.SpinOnce();
是否具有编译时内存屏障语义,或者它们正在读取的成员是否声明为volatile
)。
如果只有一个线程进行写入,您不需要任何原子RMW操作,只需在执行负载之前/之后执行负载的顺序存储序列号。但是您需要严格按照写入器中的存储顺序和读取器中的加载顺序。第一次加载序列号必须是acquire加载,如Volatile.Read
。
如果您在x86上测试,LoadLoad和StoreStore重新排序是不可能的,因此如果您的算法足够等效于SeqLock,那么如果编译器不在编译时重新排序操作,那么它将在x86上安全。具有两个单独的“索引”(序列号?)变量将阻止编译器执行死存储消除。但是,只使用纯粹的C#变量而没有类似于C++ std::atomic_thread_fence(acquire)
/ release的内容,在AArch64上会比x86上的运行时排序更弱。
您的实现实际上不像SeqLock,它更像是依赖于memory_order_consume
类型的依赖排序的2元素队列,即使在像AArch64这样的弱有序ISA上也存在。但是,如果在读取器仍在读取的同时发生另一次写入,将破坏其数据。
因此,它更像是RCU(读取复制更新),与SeqLock相比,它的优势在于始终存在一个一致的状态可供读取。但是,您在下一次写入时重复使用旧状态,因此即使您写入不频繁,如果读取器被操作系统挂起,同时写入者正在写入它正在读取的相同插槽,读取器可能会醒来并读取不一致的数据。
在像C#这样的垃圾收集语言中,RCU可以很好地工作,如果您在写入时分配一个新对象并更新引用。垃圾收集解决了清除问题,这是RCU的最难部分(知道何时所有其他线程肯定不再读取旧负载的副本)。您替换整个负载,因此实际上不是复制和修改数组或二叉树等,这使得它变得更容易。
附言:我只了解C#的一些内存模型内容和SIMD内联函数,因为这些是我关注的标签与之重叠的地方。这就是为什么我的回答基本上引用了C++中的相似内容。我本来没有计划写这个问题的真正答案,但我的评论几乎变成了一个答案。
如果有其他人想要编写C# SeqLock实现或示例,请在此或相关的C#问题中发布。
英文:
2 int
elements is only 64 bits; mainstream x86 and ARM CPUs can do 64-bit atomic stores and load (even when running in 32-bit mode), so the optimal approach would be convincing your compiler to do that. (Like C++ std::atomic<my_struct>
with memory_order_relaxed
, or in C# perhaps just packing both members into a uint64_t
, if you can get the necessary visibility guarantees for your code as well as atomicity.)
Remember that plain variables don't guarantee visibility across threads unless there's other synchronization: reading a variable in a loop can optimize to reading it into a register once, and using that value every time. Any time a load or store does happen, it's atomic, but without volatile
or Volatile.Read
/ .Write
, nothing guarantees that loads / stores happen in the asm as often as they do in the source.
In general, you might be looking for the SeqLock, where the reader retries until it gets a non-torn read of the payload + sequence number. See
- https://stackoverflow.com/q/54611003 (seqlock discussion with C++ example)
- https://stackoverflow.com/q/61237650 (SeqLock or RCU)
- https://stackoverflow.com/questions/64147529/atomic-and-lock-free-write-of-arbitrary-sized-data-vector-array (C++)
- https://stackoverflow.com/questions/75064979/how-to-get-a-stable-version-of-mutiple-values-at-a-particular-time-without-lock (C# ConcurrentQueue which uses a strategy similar to a SeqLock of re-reading members until it gets a consistent read, which has some limitations. IDK if
spin.SpinOnce();
has any compile-time-memory-barrier semantics, or if the members they're reading were declaredvolatile
.)
If only one thread ever writes, you don't need any atomic RMWs, just atomic stores of the sequence number before/after plain stores of the payload. But you do need strict ordering of the stores in the writer and loads in the reader. And the first load of the sequence number does have to be an acquire load, like Volatile.Read
.
> I ran these test for about an hour and never got any errors.
If you tested on x86, LoadLoad and StoreStore reordering are impossible, so if your algorithm is equivalent enough to a SeqLock, then it will be safe on x86 if the compiler doesn't reorder operations at compile time. Having two separate "index" (sequence number?) variables would stop the compiler from doing dead-store elimination. Using just plain C# vars without anything equivalent to C++ std::atomic_thread_fence(acquire)
/ release will break on AArch64, though, were plain variables get weaker run-time ordering than on x86.
Your implementation isn't really like a SeqLock, it's like a 2-element queue depending on a memory_order_consume
type of dependency-ordering which is a thing even on weakly-ordered ISAs like AArch64. But if another write happens while a reader is still reading, you'll spoil their data.
So it's like RCU (read copy update), with its advantage over a SeqLock that there's always a consistent state to read. But you reuse the old state on the very next write, so even if you write infrequently, a reader that gets descheduled by the OS could wake up and read inconsistent data if that happens at the same time the writer is writing the same slot it's reading.
RCU can work well in a garbage-collected language like C#, if you allocate a new object and update a reference on write. Garbage collection solves the deallocation problem, which is the hardest part of RCU (knowing when all other threads definitely aren't still reading an old copy of the payload).
You're replacing the whole payload, so not actually copying and modifying e.g. an array or binary tree, so that makes it even easier.
P.S. The only part of C# I know is some memory-model stuff and SIMD intrinsics, because that's where the tags I follow overlap with it. That's why my answer is basically citing C++ equivalents for what's possible. I never planned to write a real answer to this, but my comments kind of became almost an answer.
If someone else wants to write a C# SeqLock implementation or example, please post it as an answer here or on a related C# question.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论