英文:
Concurrent access to maps with 'range' in Go
问题
《Go博客中的“Go地图实战”》一文中提到:
> 地图在并发使用时不安全:当同时读取和写入地图时,未定义会发生什么。如果你需要在并发执行的goroutine中从地图中读取和写入数据,访问必须通过某种同步机制进行调节。保护地图的一种常见方式是使用sync.RWMutex。
然而,一种常见的访问地图的方式是使用range
关键字进行迭代。对于并发访问的目的来说,执行range
循环内部是一个“读取”操作,还是仅仅是该循环的“转交”阶段,目前并不清楚。例如,下面的代码可能会违反“地图上不允许并发读写”的规则,这取决于range
操作的具体语义/实现:
var testMap map[int]int
testMapLock := make(chan bool, 1)
testMapLock <- true
testMapSequence := 0
...
func WriteTestMap(k, v int) {
<-testMapLock
testMap[k] = v
testMapSequence++
testMapLock<-true
}
func IterateMapKeys(iteratorChannel chan int) error {
<-testMapLock
defer func() {
testMapLock <- true
}
mySeq := testMapSequence
for k, _ := range testMap {
testMapLock <- true
iteratorChannel <- k
<-testMapLock
if mySeq != testMapSequence {
close(iteratorChannel)
return errors.New("concurrent modification")
}
}
return nil
}
这里的想法是,当第二个函数等待消费者获取下一个值时,range
“迭代器”是开放的,并且写入者此时不会被阻塞。然而,在单个迭代器中,从未出现两次读取位于写入的两侧的情况。这是一种“快速失败”迭代器,借用了Java的术语。
请问在语言规范或其他文档中是否有任何内容表明这样做是合法的?我可以看到它可能有两种方式,并且上述引用的文档对于“读取”到底意味着什么并不清楚。文档在for
/range
语句的并发方面似乎完全没有提及。
(请注意,这个问题是关于for/range
的并发性,但不是重复的问题:https://stackoverflow.com/questions/34080734/golang-concurrent-map-access-with-range - 使用案例完全不同,我在这里询问的是关于range
关键字的精确锁定要求!)
英文:
The "Go maps in action" entry in the Go blog states:
> Maps are not safe for concurrent use: it's not defined what happens when you read and write to them simultaneously. If you need to read from and write to a map from concurrently executing goroutines, the accesses must be mediated by some kind of synchronization mechanism. One common way to protect maps is with sync.RWMutex.
However, one common way to access maps is to iterate over them with the range
keyword. It is not clear if for the purposes of concurrent access, execution inside a range
loop is a "read", or just the "turnover" phase of that loop. For example, the following code may or may not run afoul of the "no concurrent r/w on maps" rule, depending on the specific semantics / implementation of the range
operation:
var testMap map[int]int
testMapLock := make(chan bool, 1)
testMapLock <- true
testMapSequence := 0
...
func WriteTestMap(k, v int) {
<-testMapLock
testMap[k] = v
testMapSequence++
testMapLock<-true
}
func IterateMapKeys(iteratorChannel chan int) error {
<-testMapLock
defer func() {
testMapLock <- true
}
mySeq := testMapSequence
for k, _ := range testMap {
testMapLock <- true
iteratorChannel <- k
<-testMapLock
if mySeq != testMapSequence {
close(iteratorChannel)
return errors.New("concurrent modification")
}
}
return nil
}
The idea here is that the range
"iterator" is open when the second function is waiting for a consumer to take the next value, and the writer is not blocked at that time. However, it is never the case that two reads in a single iterator are on either side of a write - this is a "fail fast" iterator, the borrow a Java term.
Is there anything anywhere in the language specification or other documents that indicates if this is a legitimate thing to do, however? I could see it going either way, and the above quoted document is not clear on exactly what consititutes a "read". The documentation seems totally quiet on the concurrency aspects of the for
/range
statement.
(Please note this question is about the currency of for/range
, but not a duplicate of: https://stackoverflow.com/questions/34080734/golang-concurrent-map-access-with-range - the use case is completely different and I am asking about the precise locking requirement wrt the 'range' keyword here!)
答案1
得分: 20
你正在使用一个带有range
表达式的for
语句。引用自规范:For语句:
> 在循环开始之前,range表达式只会被评估一次,有一个例外:如果range表达式是一个数组或指向数组的指针,并且最多只有一个迭代变量存在,那么只会评估range表达式的长度;如果该长度是常量,根据定义,range表达式本身将不会被评估。
我们正在对一个映射进行迭代,所以它不是一个例外:range表达式在循环开始之前只会被评估一次。range表达式只是一个映射变量testMap
:
for k, _ := range testMap {}
映射值不包括键值对,它只是指向包含这些新对的数据结构的指针。为什么这很重要?因为映射值只会被评估一次,如果以后向映射中添加了新的键值对,那么在循环之前评估的映射值仍然是一个指向包含这些新对的数据结构的映射。这与对切片进行迭代不同(切片也会被评估一次),切片只是一个指向包含元素的后备数组的头部;但是,如果在迭代过程中向切片添加元素,即使这不会导致分配和复制到新的后备数组,这些元素也不会包含在迭代中(因为切片头部已经包含了长度-已经评估)。向切片追加元素可能会导致新的切片值,但是向映射添加对不会导致新的映射值。
现在来看迭代:
for k, v := range testMap {
t1 := time.Now()
someFunction()
t2 := time.Now()
}
在进入块之前,在t1 := time.Now()
行之前,k
和v
变量保存了迭代的值,它们已经从映射中读取出来了(否则它们不能保存这些值)。问题是:你认为映射在t1
和t2
之间被for ... range
语句读取了吗?在什么情况下会发生这种情况?我们这里有一个执行someFunc()
的单个goroutine。要能够通过for
语句_访问_映射,这要么需要_另一个_ goroutine,要么需要_暂停_someFunc()
。显然,这两种情况都不会发生。(for ... range
结构不是一个多goroutine的怪物。)无论有多少次迭代,在执行someFunc()
时,映射都不会被for
语句访问。
所以回答你的一个问题:在执行迭代时,映射不会在for
块内部被访问,但是在为下一次迭代设置(分配)k
和v
值时会被访问。这意味着对映射的以下迭代对于并发访问是_安全的_:
var (
testMap = make(map[int]int)
testMapLock = &sync.RWMutex{}
)
func IterateMapKeys(iteratorChannel chan int) error {
testMapLock.RLock()
defer testMapLock.RUnlock()
for k, v := range testMap {
testMapLock.RUnlock()
someFunc()
testMapLock.RLock()
if someCond {
return someErr
}
}
return nil
}
请注意,在IterateMapKeys()
中解锁应该(必须)作为延迟语句进行,因为在你的原始代码中,如果出现错误,你可能会提前返回,此时你没有解锁,这意味着映射保持锁定状态!(这里通过if someCond {...}
模拟。)
还要注意,这种类型的锁定仅在并发访问的情况下确保锁定。**它不能阻止并发goroutine修改(例如添加新的对)映射。**如果正确使用写锁进行保护,修改(如果有)将是安全的,并且循环可以继续,但不能保证for
循环将迭代新对:
> 如果在迭代过程中删除尚未到达的映射条目,则相应的迭代值将不会生成。如果在迭代过程中创建映射条目,则该条目可能在迭代过程中生成或跳过。每个条目的选择可能因每次迭代而异。
受写锁保护的修改可能如下所示:
func WriteTestMap(k, v int) {
testMapLock.Lock()
defer testMapLock.Unlock()
testMap[k] = v
}
现在,如果你在for
块中释放读锁,一个并发的goroutine可以自由地获取写锁并对映射进行修改。在你的代码中:
testMapLock <- true
iteratorChannel <- k
<-testMapLock
当在iteratorChannel
上发送k
时,一个并发的goroutine可能会修改映射。这不仅仅是一个“不幸”的情况,如果通道的缓冲区已满,发送值通常是一个“阻塞”操作,如果没有另一个goroutine准备好接收,发送操作将无法进行。在通道上发送值是运行时运行其他goroutine的良好调度点,即使在同一个操作系统线程上也是如此,更不用说如果有多个操作系统线程,其中一个可能已经“等待”写锁以执行映射修改。
总结最后一部分:在for
块内部释放读锁就像对其他人大喊:“来吧,如果你敢的话,现在修改映射!”因此,在你的代码中遇到mySeq != testMapSequence
是非常可能的。请参阅以下可运行示例以演示它(这是你示例的变体):
package main
import (
"fmt"
"math/rand"
"sync"
)
var (
testMap = make(map[int]int)
testMapLock = &sync.RWMutex{}
testMapSequence int
)
func main() {
go func() {
for {
k := rand.Intn(10000)
WriteTestMap(k, 1)
}
}()
ic := make(chan int)
go func() {
for _ = range ic {
}
}()
for {
if err := IterateMapKeys(ic); err != nil {
fmt.Println(err)
}
}
}
func WriteTestMap(k, v int) {
testMapLock.Lock()
defer testMapLock.Unlock()
testMap[k] = v
testMapSequence++
}
func IterateMapKeys(iteratorChannel chan int) error {
testMapLock.RLock()
defer testMapLock.RUnlock()
mySeq := testMapSequence
for k, _ := range testMap {
testMapLock.RUnlock()
iteratorChannel <- k
testMapLock.RLock()
if mySeq != testMapSequence {
//close(iteratorChannel)
return fmt.Errorf("concurrent modification %d", testMapSequence)
}
}
return nil
}
示例输出:
concurrent modification 24
concurrent modification 41
concurrent modification 463
concurrent modification 477
concurrent modification 482
concurrent modification 496
concurrent modification 508
concurrent modification 521
concurrent modification 525
concurrent modification 535
concurrent modification 541
concurrent modification 555
concurrent modification 561
concurrent modification 565
concurrent modification 570
concurrent modification 577
concurrent modification 591
concurrent modification 593
我们经常遇到并发修改!
你想避免这种类型的并发修改吗?解决方案非常简单:不要在for
内部释放读锁。另外,使用-race
选项运行你的应用程序以检测竞态条件:go run -race testmap.go
最后的思考
语言规范明确允许你在_同一个goroutine_中修改映射,这是前面引用的内容所涉及的(“如果在迭代过程中删除尚未到达的映射条目...如果在迭代过程中创建映射条目...”)。在同一个goroutine中修改映射是允许的并且是安全的,但是迭代器逻辑如何处理它并没有定义。
如果映射在另一个goroutine中被修改,如果你使用适当的同步,Go内存模型保证具有for ... range
的goroutine将观察到所有的修改,并且迭代器逻辑将像它“自己”的goroutine修改一样看待它-这正如前面所述是允许的。
英文:
You are using a for
statement with a range
expression. Quoting from Spec: For statements:
> The range expression is evaluated once before beginning the loop, with one exception: if the range expression is an array or a pointer to an array and at most one iteration variable is present, only the range expression's length is evaluated; if that length is constant, by definition the range expression itself will not be evaluated.
We're ranging over a map, so it's not an exception: the range expression is evaluated only once before beginning the loop. The range expression is simply a map variable testMap
:
for k, _ := range testMap {}
The map value does not include the key-value pairs, it only points to a data structure that does. Why is this important? Because the map value is only evaluated once, and if later pairs are added to the map, the map value –evaluated once before the loop– will be a map that still points to a data structure that includes those new pairs. This is in contrast to ranging over a slice (which would be evaluated once too), which is also only a header pointing to a backing array holding the elements; but if elements are added to the slice during the iteration, even if that does not result in allocating and copying over to a new backing array, they will not be included in the iteration (because the slice header also contains the length - already evaluated). Appending elements to a slice may result in a new slice value, but adding pairs to a map will not result in a new map value.
Now on to iteration:
for k, v := range testMap {
t1 := time.Now()
someFunction()
t2 := time.Now()
}
Before we enter into the block, before the t1 := time.Now()
line k
and v
variables are holding the values of the iteration, they are already read out from the map (else they couldn't hold the values). Question: do you think the map is read by the for ... range
statement between t1
and t2
? Under what circumstances could that happen? We have here a single goroutine that is executing someFunc()
. To be able to access the map by the for
statement, that would either require another goroutine, or it would require to suspend someFunc()
. Obviously neither of those happen. (The for ... range
construct is not a multi-goroutine monster.) No matter how many iterations there are, while someFunc()
is executed, the map is not accessed by the for
statement.
So to answer one of your questions: the map is not accessed inside the for
block when executing an iteration, but it is accessed when the k
and v
values are set (assigned) for the next iteration. This implies that the following iteration over the map is safe for concurrent access:
var (
testMap = make(map[int]int)
testMapLock = &sync.RWMutex{}
)
func IterateMapKeys(iteratorChannel chan int) error {
testMapLock.RLock()
defer testMapLock.RUnlock()
for k, v := range testMap {
testMapLock.RUnlock()
someFunc()
testMapLock.RLock()
if someCond {
return someErr
}
}
return nil
}
Note that unlocking in IterateMapKeys()
should (must) happen as a deferred statement, as in your original code you may return "early" with an error, in which case you didn't unlock, which means the map remained locked! (Here modeled by if someCond {...}
).
Also note that this type of locking only ensures locking in case of concurrent access. It does not prevent a concurrent goroutine to modify (e.g. add a new pair) the map. The modification (if properly guarded with write lock) will be safe, and the loop may continue, but there is no guarantee that the for loop will iterate over the new pair:
> If map entries that have not yet been reached are removed during iteration, the corresponding iteration values will not be produced. If map entries are created during iteration, that entry may be produced during the iteration or may be skipped. The choice may vary for each entry created and from one iteration to the next.
The write-lock-guarded modification may look like this:
func WriteTestMap(k, v int) {
testMapLock.Lock()
defer testMapLock.Unlock()
testMap[k] = v
}
Now if you release the read lock in the block of the for
, a concurrent goroutine is free to grab the write lock and make modifications to the map. In your code:
testMapLock <- true
iteratorChannel <- k
<-testMapLock
When sending k
on the iteratorChannel
, a concurrent goroutine may modify the map. This is not just an "unlucky" scenario, sending a value on a channel is often a "blocking" operation, if the channel's buffer is full, another goroutine must be ready to receive in order for the send operation to proceed. Sending a value on a channel is a good scheduling point for the runtime to run other goroutines even on the same OS thread, not to mention if there are multiple OS threads, of which one may already be "waiting" for the write lock in order to carry out a map modification.
To sum the last part: you releasing the read lock inside the for
block is like yelling to others: "Come, modify the map now if you dare!" Consequently in your code encountering that mySeq != testMapSequence
is very likely. See this runnable example to demonstrate it (it's a variation of your example):
package main
import (
"fmt"
"math/rand"
"sync"
)
var (
testMap = make(map[int]int)
testMapLock = &sync.RWMutex{}
testMapSequence int
)
func main() {
go func() {
for {
k := rand.Intn(10000)
WriteTestMap(k, 1)
}
}()
ic := make(chan int)
go func() {
for _ = range ic {
}
}()
for {
if err := IterateMapKeys(ic); err != nil {
fmt.Println(err)
}
}
}
func WriteTestMap(k, v int) {
testMapLock.Lock()
defer testMapLock.Unlock()
testMap[k] = v
testMapSequence++
}
func IterateMapKeys(iteratorChannel chan int) error {
testMapLock.RLock()
defer testMapLock.RUnlock()
mySeq := testMapSequence
for k, _ := range testMap {
testMapLock.RUnlock()
iteratorChannel <- k
testMapLock.RLock()
if mySeq != testMapSequence {
//close(iteratorChannel)
return fmt.Errorf("concurrent modification %d", testMapSequence)
}
}
return nil
}
Example output:
concurrent modification 24
concurrent modification 41
concurrent modification 463
concurrent modification 477
concurrent modification 482
concurrent modification 496
concurrent modification 508
concurrent modification 521
concurrent modification 525
concurrent modification 535
concurrent modification 541
concurrent modification 555
concurrent modification 561
concurrent modification 565
concurrent modification 570
concurrent modification 577
concurrent modification 591
concurrent modification 593
We're encountering concurrent modification quite often!
Do you want to avoid this kind of concurrent modification? The solution is quite simple: don't release the read lock inside the for
. Also run your app with the -race
option to detect race conditions: go run -race testmap.go
Final thoughts
The language spec clearly allows you to modify the map in the same goroutine while ranging over it, this is what the previous quote relates to ("If map entries that have not yet been reached are removed during iteration.... If map entries are created during iteration..."). Modifying the map in the same goroutine is allowed and is safe, but how it is handled by the iterator logic is not defined.
If the map is modified in another goroutine, if you use proper synchronization, The Go Memory Model guarantees that the goroutine with the for ... range
will observe all modifications, and the iterator logic will see it just as if "its own" goroutine would have modified it – which is allowed as stated before.
答案2
得分: 3
并发访问map
的for
range
循环的单位是map
本身。Go maps in action。
map
是一种动态数据结构,用于插入、更新和删除操作。Inside the Map Implementation。例如,
> map
的迭代顺序未指定,并且不能保证从一次迭代到下一次迭代保持相同。如果在迭代过程中删除了尚未访问到的map
条目,则相应的迭代值将不会产生。如果在迭代过程中创建了map
条目,则该条目可能在迭代过程中产生或被跳过。每个条目的选择可能因创建的条目而异,也可能因每次迭代而异。如果map
为nil
,则迭代次数为0。For statements, The Go Programming Language Specification
在插入、更新和删除交替进行的情况下,使用for
range
循环读取map
可能不太有用。
对map
进行加锁:
package main
import (
"sync"
)
var racer map[int]int
var race sync.RWMutex
func Reader() {
race.RLock() // 锁定map
for k, v := range racer {
_, _ = k, v
}
race.RUnlock()
}
func Write() {
for i := 0; i < 1e6; i++ {
race.Lock()
racer[i/2] = i
race.Unlock()
}
}
func main() {
racer = make(map[int]int)
Write()
go Write()
Reader()
}
在读取后不要加锁 -- fatal error: concurrent map iteration and map write
:
package main
import (
"sync"
)
var racer map[int]int
var race sync.RWMutex
func Reader() {
for k, v := range racer {
race.RLock() // 读取后加锁
_, _ = k, v
race.RUnlock()
}
}
func Write() {
for i := 0; i < 1e6; i++ {
race.Lock()
racer[i/2] = i
race.Unlock()
}
}
func main() {
racer = make(map[int]int)
Write()
go Write()
Reader()
}
使用Go数据竞争检测器。阅读Introducing the Go Race Detector。
英文:
The unit of concurrent access for a for
range
loop over a map
is the map. Go maps in action.
A map is a dynamic data structure that changes for inserts, updates and deletes. Inside the Map Implementation. For example,
> The iteration order over maps is not specified and is not guaranteed
> to be the same from one iteration to the next. If map entries that
> have not yet been reached are removed during iteration, the
> corresponding iteration values will not be produced. If map entries
> are created during iteration, that entry may be produced during the
> iteration or may be skipped. The choice may vary for each entry
> created and from one iteration to the next. If the map is nil, the
> number of iterations is 0. For statements, The Go Programming
> Language Specification
Reading a map with a for
range
loop with interleaved inserts, updates and deletes is unlikely to be useful.
Lock the map:
package main
import (
"sync"
)
var racer map[int]int
var race sync.RWMutex
func Reader() {
race.RLock() // Lock map
for k, v := range racer {
_, _ = k, v
}
race.RUnlock()
}
func Write() {
for i := 0; i < 1e6; i++ {
race.Lock()
racer[i/2] = i
race.Unlock()
}
}
func main() {
racer = make(map[int]int)
Write()
go Write()
Reader()
}
Don't lock after the read -- fatal error: concurrent map iteration and map write
:
package main
import (
"sync"
)
var racer map[int]int
var race sync.RWMutex
func Reader() {
for k, v := range racer {
race.RLock() // Lock after read
_, _ = k, v
race.RUnlock()
}
}
func Write() {
for i := 0; i < 1e6; i++ {
race.Lock()
racer[i/2] = i
race.Unlock()
}
}
func main() {
racer = make(map[int]int)
Write()
go Write()
Reader()
}
Use the Go Data Race Detector. Read Introducing the Go Race Detector.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论