英文:
Reasonable use of goroutines in Go programs
问题
我的程序有一个长时间运行的任务。我有一个名为jdIdList
的列表,它太大了,最多有1000000
个项目,所以下面的代码无法正常工作。有没有办法通过更好地使用goroutines来改进代码?
似乎我运行了太多的goroutines,导致我的代码无法运行。
运行的合理goroutines数量是多少?
var wg sync.WaitGroup
wg.Add(len(jdIdList))
c := make(chan string)
// 将jdIdList视为[0...1000000]
for _, jdId := range jdIdList {
go func(jdId string) {
defer wg.Done()
for _, itemId := range itemIdList {
// 下面的代码执行一些计算,耗时较长(你可以用time.Sleep(time.Second * 1)来替代)
cvVec, ok := cvVecMap[itemId]
if !ok {
continue
}
jdVec, ok := jdVecMap[jdId]
if !ok {
continue
}
// 长时间的计算
_ = 0.3*computeDist(jdVec.JdPosVec, cvVec.CvPosVec) + 0.7*computeDist(jdVec.JdDescVec, cvVec.CvDescVec)
}
c <- fmt.Sprintf("done %s", jdId)
}(jdId)
}
go func() {
for resp := range c {
fmt.Println(resp)
}
}()
英文:
My program has a long running task. I have a list jdIdList
that is too big - up to 1000000
items, so the code below doesn't work. Is there a way to improve the code with better use of goroutines?
It seems I have too many goroutines running which makes my code fail to run.
What is a reasonable number of goroutines to have running?
var wg sync.WaitGroup
wg.Add(len(jdIdList))
c := make(chan string)
// just think jdIdList as [0...1000000]
for _, jdId := range jdIdList {
go func(jdId string) {
defer wg.Done()
for _, itemId := range itemIdList {
// following code is doing some computation which consumes much time(you can just replace them with time.Sleep(time.Second * 1)
cvVec, ok := cvVecMap[itemId]
if !ok {
continue
}
jdVec, ok := jdVecMap[jdId]
if !ok {
continue
}
// long time compute
_ = 0.3*computeDist(jdVec.JdPosVec, cvVec.CvPosVec) + 0.7*computeDist(jdVec.JdDescVec, cvVec.CvDescVec)
}
c <- fmt.Sprintf("done %s", jdId)
}(jdId)
}
go func() {
for resp := range c {
fmt.Println(resp)
}
}()
答案1
得分: 2
看起来你同时运行了太多的任务,导致计算机内存不足。
这是你的代码的一个版本,它使用了有限数量的工作协程,而不是像你的示例中那样使用一百万个协程。由于只有少数几个协程同时运行,它们在系统开始交换之前有更多的可用内存。确保每个小计算所需的内存乘以并发协程的数量小于系统中的内存。因此,如果for jdId := range work
循环内的代码所需的内存小于1GB,并且您有4个核心和至少4GB的RAM,将clvl
设置为4
应该可以正常工作。
我还删除了等待组。代码仍然正确,但只使用通道进行同步。对通道的for range循环会从该通道读取,直到通道关闭。这是我们告诉工作线程我们何时完成的方式。
以下是修改后的代码:
runtime.GOMAXPROCS(runtime.NumCPU()) // 在 go 1.5 或更高版本中不需要
c := make(chan string)
work := make(chan int, 1) // 将 1 增加到更大的数字可能会提高性能
clvl := 4 // runtime.NumCPU() // 模拟有 4 个核心,否则使用 NumCPU
var wg sync.WaitGroup
wg.Add(clvl)
for i := 0; i < clvl; i++ {
go func(i int) {
for jdId := range work {
time.Sleep(time.Millisecond * 100)
c <- fmt.Sprintf("done %d", jdId)
}
wg.Done()
}(i)
}
// 给工作线程一些任务
go func() {
for i := 0; i < 10; i++ {
work <- i
}
close(work)
}()
// 当所有工作线程完成时关闭输出通道
go func() {
wg.Wait()
close(c)
}()
count := 0
for resp := range c {
fmt.Println(resp, count)
count += 1
}
在模拟四个 CPU 核心的情况下,该代码在 Go Playground 上生成了以下输出:
done 1 0
done 0 1
done 3 2
done 2 3
done 5 4
done 4 5
done 7 6
done 6 7
done 9 8
done 8 9
请注意,顺序不是保证的。jdId
变量保存了您想要的值。您应该始终使用Go 竞争检测器测试并发程序。
还请注意,如果您使用的是 Go 1.4 或更早版本,并且尚未将 GOMAXPROCS 环境变量设置为核心数,您应该这样做,或者在程序开头添加 runtime.GOMAXPROCS(runtime.NumCPU())
。
英文:
It looks like you're running too many things concurrently, making your computer run out of memory.
Here's a version of your code that uses a limited number of worker goroutines instead of a million goroutines as in your example. Since only a few goroutines run at once, they have much more memory available each before the system starts to swap. Make sure the memory each small computation requires times the number of concurrent goroutines is less than the memory you have in your system, so if the code inside for jdId := range work
loop requires less than 1GB memory, and you have 4 cores and at least 4 GB of RAM, setting clvl
to 4
should work fine.
I also removed the waitgroups. The code is still correct, but only uses channels for synchronization. A for range loop over a channel reads from that channel until it is closed. This is how we tell the worker threads when we are done.
https://play.golang.org/p/Sy3i77TJjA
runtime.GOMAXPROCS(runtime.NumCPU()) // not needed on go 1.5 or later
c := make(chan string)
work := make(chan int, 1) // increasing 1 to a higher number will probably increase performance
clvl := 4 // runtime.NumCPU() // simulating having 4 cores, use NumCPU otherwise
var wg sync.WaitGroup
wg.Add(clvl)
for i := 0; i < clvl; i++ {
go func(i int) {
for jdId := range work {
time.Sleep(time.Millisecond * 100)
c <- fmt.Sprintf("done %d", jdId)
}
wg.Done()
}(i)
}
// give workers something to do
go func() {
for i := 0; i < 10; i++ {
work <- i
}
close(work)
}()
// close output channel when all workers are done
go func() {
wg.Wait()
close(c)
}()
count := 0
for resp := range c {
fmt.Println(resp, count)
count += 1
}
which generated this output on go playground, while simulating four cpu cores.
done 1 0
done 0 1
done 3 2
done 2 3
done 5 4
done 4 5
done 7 6
done 6 7
done 9 8
done 8 9
Notice how the ordering is not guaranteed. The jdId
variable holds the value you want. You should always test your concurrent programs using the go race detector.
Also note that if you are using go 1.4 or earlier and haven't set the GOMAXPROCS environment variable to the number of cores, you should do that, or add runtime.GOMAXPROCS(runtime.NumCPU())
to the beginning of your program.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论