在Go程序中合理使用goroutines

huangapple go评论81阅读模式
英文:

Reasonable use of goroutines in Go programs

问题

我的程序有一个长时间运行的任务。我有一个名为jdIdList的列表,它太大了,最多有1000000个项目,所以下面的代码无法正常工作。有没有办法通过更好地使用goroutines来改进代码?

似乎我运行了太多的goroutines,导致我的代码无法运行。

运行的合理goroutines数量是多少?

var wg sync.WaitGroup
wg.Add(len(jdIdList))
c := make(chan string)

// 将jdIdList视为[0...1000000]
for _, jdId := range jdIdList {
	go func(jdId string) {
		defer wg.Done()
		for _, itemId := range itemIdList {
			// 下面的代码执行一些计算,耗时较长(你可以用time.Sleep(time.Second * 1)来替代)
			cvVec, ok := cvVecMap[itemId]
			if !ok {
				continue
			}
			jdVec, ok := jdVecMap[jdId]
			if !ok {
				continue
			}
			// 长时间的计算
			_ = 0.3*computeDist(jdVec.JdPosVec, cvVec.CvPosVec) + 0.7*computeDist(jdVec.JdDescVec, cvVec.CvDescVec)
		}
		c <- fmt.Sprintf("done %s", jdId)
	}(jdId)

}

go func() {
	for resp := range c {
		fmt.Println(resp)
	}
}()
英文:

My program has a long running task. I have a list jdIdList that is too big - up to 1000000 items, so the code below doesn't work. Is there a way to improve the code with better use of goroutines?

It seems I have too many goroutines running which makes my code fail to run.

What is a reasonable number of goroutines to have running?

var wg sync.WaitGroup
wg.Add(len(jdIdList))
c := make(chan string)

// just think jdIdList as [0...1000000]
for _, jdId := range jdIdList {
	go func(jdId string) {
		defer wg.Done()
		for _, itemId := range itemIdList {
            // following code is doing some computation which consumes much time(you can just replace them with time.Sleep(time.Second * 1)
			cvVec, ok := cvVecMap[itemId]
			if !ok {
				continue
			}
			jdVec, ok := jdVecMap[jdId]
			if !ok {
				continue
			}
			// long time compute
			_ = 0.3*computeDist(jdVec.JdPosVec, cvVec.CvPosVec) + 0.7*computeDist(jdVec.JdDescVec, cvVec.CvDescVec)
		}
		c &lt;- fmt.Sprintf(&quot;done %s&quot;, jdId)
	}(jdId)

}

go func() {
	for resp := range c {
		fmt.Println(resp)
	}
}()

答案1

得分: 2

看起来你同时运行了太多的任务,导致计算机内存不足。

这是你的代码的一个版本,它使用了有限数量的工作协程,而不是像你的示例中那样使用一百万个协程。由于只有少数几个协程同时运行,它们在系统开始交换之前有更多的可用内存。确保每个小计算所需的内存乘以并发协程的数量小于系统中的内存。因此,如果for jdId := range work循环内的代码所需的内存小于1GB,并且您有4个核心和至少4GB的RAM,将clvl设置为4应该可以正常工作。

我还删除了等待组。代码仍然正确,但只使用通道进行同步。对通道的for range循环会从该通道读取,直到通道关闭。这是我们告诉工作线程我们何时完成的方式。

以下是修改后的代码:

runtime.GOMAXPROCS(runtime.NumCPU()) // 在 go 1.5 或更高版本中不需要
c := make(chan string)
work := make(chan int, 1) // 将 1 增加到更大的数字可能会提高性能
clvl := 4 // runtime.NumCPU() // 模拟有 4 个核心,否则使用 NumCPU
var wg sync.WaitGroup
wg.Add(clvl)
for i := 0; i < clvl; i++ {
	go func(i int) {
		for jdId := range work {
			time.Sleep(time.Millisecond * 100)
			c <- fmt.Sprintf("done %d", jdId)
		}
		wg.Done()
	}(i)
}

// 给工作线程一些任务
go func() { 
	for i := 0; i < 10; i++ {
		work <- i
	}

	close(work)
}()

// 当所有工作线程完成时关闭输出通道
go func() { 
	wg.Wait()
	close(c)
}()

count := 0
for resp := range c {
	fmt.Println(resp, count)
	count += 1
}

在模拟四个 CPU 核心的情况下,该代码在 Go Playground 上生成了以下输出:

done 1 0
done 0 1
done 3 2
done 2 3
done 5 4
done 4 5
done 7 6
done 6 7
done 9 8
done 8 9

请注意,顺序不是保证的。jdId 变量保存了您想要的值。您应该始终使用Go 竞争检测器测试并发程序。

还请注意,如果您使用的是 Go 1.4 或更早版本,并且尚未将 GOMAXPROCS 环境变量设置为核心数,您应该这样做,或者在程序开头添加 runtime.GOMAXPROCS(runtime.NumCPU())

英文:

It looks like you're running too many things concurrently, making your computer run out of memory.

Here's a version of your code that uses a limited number of worker goroutines instead of a million goroutines as in your example. Since only a few goroutines run at once, they have much more memory available each before the system starts to swap. Make sure the memory each small computation requires times the number of concurrent goroutines is less than the memory you have in your system, so if the code inside for jdId := range work loop requires less than 1GB memory, and you have 4 cores and at least 4 GB of RAM, setting clvl to 4 should work fine.

I also removed the waitgroups. The code is still correct, but only uses channels for synchronization. A for range loop over a channel reads from that channel until it is closed. This is how we tell the worker threads when we are done.

https://play.golang.org/p/Sy3i77TJjA

runtime.GOMAXPROCS(runtime.NumCPU()) // not needed on go 1.5 or later
c := make(chan string)
work := make(chan int, 1) // increasing 1 to a higher number will probably increase performance
clvl := 4 // runtime.NumCPU() // simulating having 4 cores, use NumCPU otherwise
var wg sync.WaitGroup
wg.Add(clvl)
for i := 0; i &lt; clvl; i++ {
	go func(i int) {
		for jdId := range work {
			time.Sleep(time.Millisecond * 100)
			c &lt;- fmt.Sprintf(&quot;done %d&quot;, jdId)
		}
		wg.Done()
	}(i)
}

// give workers something to do
go func() { 
	for i := 0; i &lt; 10; i++ {
		work &lt;- i
	}

	close(work)
}()

// close output channel when all workers are done
go func() { 
	wg.Wait()
	close(c)
}()

count := 0
for resp := range c {
	fmt.Println(resp, count)
	count += 1
}

which generated this output on go playground, while simulating four cpu cores.

done 1 0
done 0 1
done 3 2
done 2 3
done 5 4
done 4 5
done 7 6
done 6 7
done 9 8
done 8 9

Notice how the ordering is not guaranteed. The jdId variable holds the value you want. You should always test your concurrent programs using the go race detector.

Also note that if you are using go 1.4 or earlier and haven't set the GOMAXPROCS environment variable to the number of cores, you should do that, or add runtime.GOMAXPROCS(runtime.NumCPU()) to the beginning of your program.

huangapple
  • 本文由 发表于 2016年1月15日 11:15:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/34803816.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定