How to write a concurrent for loop in Go with sync/errgroup package

huangapple go评论83阅读模式
英文:

How to write a concurrent for loop in Go with sync/errgroup package

问题

我可以帮你翻译这段内容。以下是翻译的结果:

我想要同时对切片的元素执行操作。
我正在使用sync/errgroup包来处理并发。

这是一个在Go Playground上的最小复现示例:https://go.dev/play/p/yBCiy8UW_80

import (
    "fmt"
    "golang.org/x/sync/errgroup"
)

func main() {
    eg := errgroup.Group{}
    input := []int{0, 1, 2}
    output1 := []int{}
    output2 := make([]int, len(input))
    for i, n := range input {
        eg.Go(func() (err error) {
            output1 = append(output1, n+1)
            output2[i] = n + 1
            return nil
        })
    }
    eg.Wait()
    fmt.Printf("with append %+v", output1)
    fmt.Println()
    fmt.Printf("with make %+v", output2)
}

输出结果为:

with append [3 3 3]
with make [0 0 3]

与期望的结果 [1 2 3] 不符。

英文:

I would like to concurrently perform an operation on the elements of a slice
I am using the sync/errgroup package to handle concurrency

Here is a minimal reproduction on Go Playground https://go.dev/play/p/yBCiy8UW_80

import (
    "fmt"
    "golang.org/x/sync/errgroup"
)

func main() {
    eg := errgroup.Group{}
    input := []int{0, 1, 2}
    output1 := []int{}
    output2 := make([]int, len(input))
    for i, n := range input {
        eg.Go(func() (err error) {
            output1 = append(output1, n+1)
            output2[i] = n + 1
            return nil
        })
    }
    eg.Wait()
    fmt.Printf("with append %+v", output1)
    fmt.Println()
    fmt.Printf("with make %+v", output2)
}

outputs

with append [3 3 3]
with make [0 0 3]

versus expected [1 2 3]

答案1

得分: 4

你这里有两个不同的问题:


首先,你的循环中的变量在每个goroutine有机会读取之前就会发生变化。当你有一个像这样的循环:

for i, n := range input {
  // ...
}

变量in在整个循环的持续时间内存在。当控制流到达循环底部并跳回到顶部时,这些变量会被赋予新的值。如果在循环中启动的goroutine正在使用这些变量,那么它们的值将不可预测地改变。这就是为什么你在示例输出中看到相同的数字多次出现的原因。在第一个循环迭代中启动的goroutine直到n已经被设置为2才开始执行。

要解决这个问题,你可以像NotX的答案所示那样创建新的变量,这些变量的作用域仅限于循环的单个迭代:

for i, n := range input {
  ic, nc := i, n
  // 使用ic和nc代替i和n
}

在循环内部声明的变量的作用域仅限于循环的单个迭代,因此当循环的下一次迭代开始时,全新的变量将被创建,防止原始变量在启动goroutine和实际开始运行之间发生更改。


其次,你正在同时从不同的goroutine并发修改相同的值,这是不安全的。特别是,你正在使用append同时追加到同一个切片。在这种情况下,发生的情况是不确定的,可能会发生各种糟糕的事情。

有两种方法可以解决这个问题。第一种方法你已经设置好了:使用make预先分配一个输出切片,然后让每个goroutine在切片的特定位置填充:

output := make([]int, 3)
for i, n := range input {
  ic, nc := i, n
  eg.Go(func() (err error) {
    output[ic] = nc + 1
    return nil
  })
}
eg.Wait()

如果你在开始循环时知道有多少个输出,这种方法非常有效。

另一种选择是使用某种锁来控制对输出切片的访问。sync.Mutex非常适合这个任务:

var output []int
mu sync.Mutex
for _, n := range input {
  nc := n
  eg.Go(func() (err error) {
    mu.Lock()
    defer mu.Unlock()
    output = append(output, nc+1)
    return nil
  })
}
eg.Wait()

如果你不知道有多少个输出,这种方法也可以工作,但它不能保证输出的顺序 - 它可以是任意顺序。如果你想按顺序排列,你可以在所有goroutine完成后进行某种排序。

英文:

You have two separate issues going on here:


First, the variables in your loop are changing before each goroutine gets a chance to read them. When you have a loop like

for i, n, := range input {
  // ...
}

the variables i and n live for the whole duration of the loop. When control reaches the bottom of the loop and jumps back up to the top, those variables get assigned new values. If a goroutine started in the loop is using those variables, then their value will change unpredictably. This is why you are seeing the same number show up multiple times in the output of your example. The goroutine started in the first loop iteration doesn't start executing until n has already been set to 2.

To solve this, you can do what NotX's answer shows and create new variables that are scoped to just a single iteration of the loop:

for i, n := range input {
  ic, nc := i, n
  // use ic and nc instead of i and n
}

Variables declared inside a loop are scoped to just a single iteration of the loop, so when the next iteration of the loop starts, entirely new variables get created, preventing the originals from changing between when the goroutine is launched and when it actually starts running.


Second you are concurrently modifying the same value from different goroutines, which isn't safe. In particular, you're using append to append to the same slice concurrently. What happens in this case is undefined and all kinds of bad things could happen.

There are two ways to deal with this. The first one you already have set up: pre-allocate an output slice with make and then have each goroutine fill in a specific position in the slice:

output := make([]int, 3)
for i, n := range input {
  ic, nc := i, n
  eg.Go(func() (err error) {
    output[ic] = nc + 1
    return nil
  })
}
eg.Wait()

This works great if you know how many outputs you're going to have when you start the loop.

The other option is to use some kind of locking to control access to the output slice. sync.Mutex works great for this:

var output []int
mu sync.Mutex
for _, n := range input {
  nc := n
  eg.Go(func() (err error) {
    mu.Lock()
    defer mu.Unlock()
    output = append(output, nc+1)
    return nil
  })
}
eg.Wait()

This works if you don't know how many outputs you have, but it doesn't guarantee anything about the order of the output - it could be in any order. If you want to put it in order, you can always do some kind of sort after all the goroutines finish.

答案2

得分: 2

在运行一些Go协程时,不能保证它们的顺序。所以,虽然你可以期望元素123,但你不能对顺序做任何假设。

不管怎样,看起来第一个eg.Go()调用发生在for循环实际上已经到达第三个元素时。这就是为什么你只得到3,并且只能通过索引访问第三个位置(其中i=2)。

如果你像这样复制你的值,问题就有些修复了:

for i, n := range input {
	nc, ic := n, i
	eg.Go(func() (err error) {
		output1 = append(output1, nc+1)
		output2[ic] = nc + 1
		return nil
	})
}

话虽如此,对我来说,结果看起来是这样的:

with append [3 2 1]
with make [1 2 3]

所以顺序仍然不是我们可能期望的。
不过,我对errgroup包不是很了解,也许其他人可以分享更多关于执行顺序的信息。

英文:

There is no guarantee of the order when you run some go routines. So while it's okay to expect the elements 1, 2, 3, you shouldn't make any assumption about the order.

Anyway, it looks like the first eg.Go() call happens when for for loop has actually reached it's third element. This is why you only get 3, and with index access only at the 3rd position (where i=2).

If you copy your values like this, the problem is somewhat fixed:

for i, n := range input {
	nc, ic := n, i
	eg.Go(func() (err error) {
		output1 = append(output1, nc+1)
		output2[ic] = nc + 1
		return nil
	})
}

That said, the result looks like

with append [3 2 1]
with make [1 2 3]

for me, so the order still isn't we might have expected.
I'm no expert on the errgroup package, though, so maybe somebody else can share more information about the order of execution.

huangapple
  • 本文由 发表于 2022年11月10日 00:57:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/74378624.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定