
huangapple go评论105阅读模式

Why does the race detector report a race condition, here?




  1. var count int
  2. func main() {
  3. go update()
  4. for {
  5. fmt.Println(count)
  6. time.Sleep(time.Second)
  7. }
  8. }
  9. func update() {
  10. for {
  11. time.Sleep(time.Second)
  12. count++
  13. }
  14. }



  1. var count int
  2. var mutex sync.RWMutex
  3. func main() {
  4. go update()
  5. for {
  6. mutex.RLock()
  7. fmt.Println(count)
  8. mutex.RUnlock()
  9. time.Sleep(time.Second)
  10. }
  11. }
  12. func update(){
  13. for {
  14. time.Sleep(time.Second)
  15. mutex.Lock()
  16. count++
  17. mutex.Unlock()
  18. }
  19. }





I am using Go race detection (the -race argument), and it detects some race conditions issues that I think should not be reported. I've created this sample code to explain my findings. Please do not comment about the goal of this example, as it has no goal other than to explain the issue.

This code:

  1. var count int
  2. func main() {
  3. go update()
  4. for {
  5. fmt.Println(count)
  6. time.Sleep(time.Second)
  7. }
  8. }
  9. func update() {
  10. for {
  11. time.Sleep(time.Second)
  12. count++
  13. }
  14. }

is reported with a race condition.

While this code:

  1. var count int
  2. var mutex sync.RWMutex
  3. func main() {
  4. go update()
  5. for {
  6. mutex.RLock()
  7. fmt.Println(count)
  8. mutex.RUnlock()
  9. time.Sleep(time.Second)
  10. }
  11. }
  12. func update(){
  13. for {
  14. time.Sleep(time.Second)
  15. mutex.Lock()
  16. count++
  17. mutex.Unlock()
  18. }
  19. }

is not reported with any race condition issues.

My question is why?
There no bug in the first code.
The main function is reading a variable that another go routine is updating.
There is no potential hidden bug here.
The second code mutex does not provide any different behavior.

Where am I wrong here?


得分: 4








Your code contains a very clear race.

Your for loop is accessing count at the same time that the other goroutine is updating it. That's the definition of a race.

> The main function is reading a variable that another go routine is updating.

Yes, exactly. That's what a race is.

> The second code mutex does not provide any different behavior.

Yes, it does. It prevents the variable from being read and written at the same time from different goroutines.


得分: 2











You need to draw a distinction between a synchronization bug and a data race. A synchronization bug is a property of the code, whereas a data race is a property of a particular execution of the program. The latter is a manifestation of the former, but is in general not guaranteed to occur.

> There no bug in the first code. The main function is reading a variable that another go routine is updating. There is no potential hidden bug here.

The race detector only detects data races, not synchronization bugs. It may miss some data races (false negatives), but it never reports false positives:

> The race detector is a powerful tool for checking the correctness of concurrent programs. It will not issue false positives, so take its warnings seriously.

In other words, when the race detector reports a data race, you can be sure that your code contains at least one synchronization bug. You need to fix such bugs; otherwise, all bets are off.

Lo and behold, your first code snippet does indeed contain a synchronization bug: package-level variable count is accessed (by main) and updated (by update, started as a goroutine) concurrently without any synchronization. Here is a relevant passage of the Go Memory Model:

> Programs that modify data being simultaneously accessed by multiple goroutines must serialize such access.
> To serialize access, protect the data with channel operations or other synchronization primitives such as those in the sync and sync/atomic packages.

Using a reader/writer mutual-exclusion lock, as you did in your second snippet, fixes your synchronization bug.

> The second code mutex does not provide any different behavior.

You just got lucky, when you executed the first program, that no data race occurred. In general, you have no guarantee.


得分: 1

这与Go语言无关(即使在x86 CPU上,示例Go代码也不会触发该问题),但我有一个大约十年前的演示证明,即使使用LOCK CMPXCHG8B进行读写操作,某些x86 CPU(我认为我们使用的是早期的Haswell实现)上的“撕裂读取”也可能产生不一致的值。


事实证明,只要LOCK CMPXCHG8B指令不跨越页面边界,它们就可以在未对齐的指针上“工作”。但是,一旦跨越页面边界,读取者在写入者执行原子写入时可能会看到撕裂读取,即它们获取一半旧值和一半新值。


<sup>1</sup>是否是错误取决于如何使用分配的对象,但我们将它们用作具有LOCK CMPXCHG8B指令的8字节宽指针。


This is off topic for Go (and the sample Go code won't trigger the problem even on x86 CPUs), but I have a demonstration proof, from roughly a decade ago at this point, that "torn reads" can produce inconsistent values even if the read and write operations are done with LOCK CMPXCHG8B, on some x86 CPUs (I think we were using early Haswell implementations).

The particular conditions that trigger this are a little difficult to set up. We had a custom allocator that had a bug: it only did four-byte alignment.<sup>1</sup> We then had a "lock-free" (single locking instruction) algorithm to add entries to a queue, with single-writer multi-reader semantics.

It turns out that LOCK CMPXCHG8B instructions "work" on misaligned pointers as long as they do not cross page boundaries. As soon as they do, though, the readers can see a torn read, in which they get half the old value and half the new value, when a writer is doing an atomic write.

The result was an extremely difficult-to-track-down bug, where the system would run well for hours or even days before tripping over one of these. I finally diagnosed it by observing the data patterns, and eventually tracked the problem down to the allocator.

<sup>1</sup>Whether this is a bug depends on how one uses the allocated objects, but we were using them as 8-byte-wide pointers with LOCK CMPXCHG8B instructions.

  • 本文由 发表于 2021年8月3日 17:14:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/68633332.html



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
