英文:
Generating random numbers concurrently in Go
问题
我是Go语言和并发/并行编程的新手。为了尝试并且希望看到goroutine的性能优势,我编写了一个小的测试程序,它简单地生成1亿个随机整数 - 首先在一个goroutine中生成,然后在与runtime.NumCPU()
报告的goroutine数量相同的goroutine中生成。
然而,我发现使用多个goroutine比使用单个goroutine的性能要差。我猜测我在程序设计或使用goroutine/通道/其他Go特性的方式上可能遗漏了一些重要的东西。非常感谢任何反馈。
我附上下面的代码。
package main
import "fmt"
import "time"
import "math/rand"
import "runtime"
func main() {
// 确定有多少个可用的CPU,并告诉Go使用所有的CPU
numThreads := runtime.NumCPU()
runtime.GOMAXPROCS(numThreads)
// 要生成的随机整数数量
var numIntsToGenerate = 100000000
// 每个生成的goroutine线程要生成的整数数量
var numIntsPerThread = numIntsToGenerate / numThreads
// 用于从goroutine返回到主函数的通道
ch := make(chan int, numIntsToGenerate)
// 用于保存生成的整数的切片
singleThreadIntSlice := make([]int, numIntsToGenerate, numIntsToGenerate)
multiThreadIntSlice := make([]int, numIntsToGenerate, numIntsToGenerate)
fmt.Printf("Initiating single-threaded random number generation.\n")
startSingleRun := time.Now()
// 从单个goroutine生成所有整数,从通道中获取预期数量的整数并放入目标切片中
go makeRandomNumbers(numIntsToGenerate, ch)
for i := 0; i < numIntsToGenerate; i++ {
singleThreadIntSlice = append(singleThreadIntSlice, (<-ch))
}
elapsedSingleRun := time.Since(startSingleRun)
fmt.Printf("Single-threaded run took %s\n", elapsedSingleRun)
fmt.Printf("Initiating multi-threaded random number generation.\n")
startMultiRun := time.Now()
// 运行指定数量的goroutine,每个goroutine生成其预期份额的总随机整数,从通道中获取预期数量的整数并放入目标切片中
for i := 0; i < numThreads; i++ {
go makeRandomNumbers(numIntsPerThread, ch)
}
for i := 0; i < numIntsToGenerate; i++ {
multiThreadIntSlice = append(multiThreadIntSlice, (<-ch))
}
elapsedMultiRun := time.Since(startMultiRun)
fmt.Printf("Multi-threaded run took %s\n", elapsedMultiRun)
}
func makeRandomNumbers(numInts int, ch chan int) {
source := rand.NewSource(time.Now().UnixNano())
generator := rand.New(source)
for i := 0; i < numInts; i++ {
ch <- generator.Intn(numInts*100)
}
}
英文:
I'm new to Go and to concurrent/parallel programming in general. In order to try out (and hopefully see the performance benefits of) goroutines, I've put together a small test program that simply generates 100 million random int
s - first in a single goroutine, and then in as many goroutines as reported by runtime.NumCPU()
.
However, I consistently get worse performance using more goroutines than using a single one. I assume I'm missing something vital in either my programs design or the way in which I use goroutines/channels/other Go features. Any feedback is much appreciated.
I attach the code below.
package main
import "fmt"
import "time"
import "math/rand"
import "runtime"
func main() {
// Figure out how many CPUs are available and tell Go to use all of them
numThreads := runtime.NumCPU()
runtime.GOMAXPROCS(numThreads)
// Number of random ints to generate
var numIntsToGenerate = 100000000
// Number of ints to be generated by each spawned goroutine thread
var numIntsPerThread = numIntsToGenerate / numThreads
// Channel for communicating from goroutines back to main function
ch := make(chan int, numIntsToGenerate)
// Slices to keep resulting ints
singleThreadIntSlice := make([]int, numIntsToGenerate, numIntsToGenerate)
multiThreadIntSlice := make([]int, numIntsToGenerate, numIntsToGenerate)
fmt.Printf("Initiating single-threaded random number generation.\n")
startSingleRun := time.Now()
// Generate all of the ints from a single goroutine, retrieve the expected
// number of ints from the channel and put in target slice
go makeRandomNumbers(numIntsToGenerate, ch)
for i := 0; i < numIntsToGenerate; i++ {
singleThreadIntSlice = append(singleThreadIntSlice,(<-ch))
}
elapsedSingleRun := time.Since(startSingleRun)
fmt.Printf("Single-threaded run took %s\n", elapsedSingleRun)
fmt.Printf("Initiating multi-threaded random number generation.\n")
startMultiRun := time.Now()
// Run the designated number of goroutines, each of which generates its
// expected share of the total random ints, retrieve the expected number
// of ints from the channel and put in target slice
for i := 0; i < numThreads; i++ {
go makeRandomNumbers(numIntsPerThread, ch)
}
for i := 0; i < numIntsToGenerate; i++ {
multiThreadIntSlice = append(multiThreadIntSlice,(<-ch))
}
elapsedMultiRun := time.Since(startMultiRun)
fmt.Printf("Multi-threaded run took %s\n", elapsedMultiRun)
}
func makeRandomNumbers(numInts int, ch chan int) {
source := rand.NewSource(time.Now().UnixNano())
generator := rand.New(source)
for i := 0; i < numInts; i++ {
ch <- generator.Intn(numInts*100)
}
}
答案1
得分: 6
首先,让我们对你的代码进行一些修正和优化:
自从Go 1.5版本以后,GOMAXPROCS
默认设置为可用的CPU核心数,所以不需要设置它(尽管这样做也没有坏处)。
生成数字的部分代码如下:
var numIntsToGenerate = 100000000
var numIntsPerThread = numIntsToGenerate / numThreads
如果numThreads
是3这样的数字,在使用多个goroutine的情况下,你将会生成更少的数字(因为进行了整数除法),所以让我们进行修正:
numIntsToGenerate = numIntsPerThread * numThreads
不需要为1亿个值创建一个缓冲区,将其减小到一个合理的值(例如1000):
ch := make(chan int, 1000)
如果你想使用append()
函数,你创建的切片应该有0长度(和适当的容量):
singleThreadIntSlice := make([]int, 0, numIntsToGenerate)
multiThreadIntSlice := make([]int, 0, numIntsToGenerate)
但在你的情况下,这是不必要的,因为只有一个goroutine在收集结果,你可以简单地使用索引,并像这样创建切片:
singleThreadIntSlice := make([]int, numIntsToGenerate)
multiThreadIntSlice := make([]int, numIntsToGenerate)
当收集结果时:
for i := 0; i < numIntsToGenerate; i++ {
singleThreadIntSlice[i] = <-ch
}
// ...
for i := 0; i < numIntsToGenerate; i++ {
multiThreadIntSlice[i] = <-ch
}
好了,代码现在更好了。尝试运行它,你仍然会发现多goroutine版本运行得更慢。为什么会这样呢?
这是因为控制、同步和收集来自多个goroutine的结果会带来开销。如果它们执行的任务很小,通信开销就会更大,整体性能就会下降。
你的情况就是这样。一旦设置了rand.Rand()
,生成一个随机数的时间非常快。
让我们修改你的"任务",使其足够大,以便我们可以看到多个goroutine的好处:
// 现在1百万已经足够了:
var numIntsToGenerate = 1000 * 1000
func makeRandomNumbers(numInts int, ch chan int) {
source := rand.NewSource(time.Now().UnixNano())
generator := rand.New(source)
for i := 0; i < numInts; i++ {
// 耗费时间,进行一些处理:
for j := 0; j < 1000; j++ {
generator.Intn(numInts * 100)
}
// 现在返回一个随机数
ch <- generator.Intn(numInts * 100)
}
}
在这种情况下,为了得到一个随机数,我们生成了1000个随机数,然后将它们丢弃(进行一些计算/耗费时间),然后再生成我们要返回的那个随机数。我们这样做是为了使工作goroutine的计算时间超过多个goroutine的通信开销。
现在运行应用程序,在一个4核的机器上得到以下结果:
Initiating single-threaded random number generation.
Single-threaded run took 2.440604504s
Initiating multi-threaded random number generation.
Multi-threaded run took 987.946758ms
多goroutine版本运行快了2.5倍。这意味着如果你的goroutine以1000个数字块的形式生成随机数,你将会看到执行速度提高了2.5倍(与单个goroutine生成相比)。
最后一点注意:
你的单个goroutine版本也使用了多个goroutine:一个用于生成数字,一个用于收集结果。很可能收集器并没有充分利用一个CPU核心,大部分时间只是等待结果,但仍然使用了2个CPU核心。让我们估计一下,假设有"1.5"个CPU核心被利用。而多goroutine版本利用了4个CPU核心。粗略估计:4 / 1.5 = 2.66,非常接近我们的性能提升。
英文:
First let's correct and optimize some things in your code:
Since Go 1.5, GOMAXPROCS
defaults to the number of CPU cores available, so no need to set that (although it does no harm).
Numbers to generate:
var numIntsToGenerate = 100000000
var numIntsPerThread = numIntsToGenerate / numThreads
If numThreads
is like 3, in case of multi goroutines, you'll have less numbers generated (due to integer division), so let's correct it:
numIntsToGenerate = numIntsPerThread * numThreads
No need a buffer for 100 million values, reduce that to a sensible value (e.g. 1000):
ch := make(chan int, 1000)
If you want to use append()
, the slices you create should have 0 length (and proper capacity):
singleThreadIntSlice := make([]int, 0, numIntsToGenerate)
multiThreadIntSlice := make([]int, 0, numIntsToGenerate)
But in your case that's unnecessary, as only 1 goroutine is collecting the results, you can simply use indexing, and create slices like this:
singleThreadIntSlice := make([]int, numIntsToGenerate)
multiThreadIntSlice := make([]int, numIntsToGenerate)
And when collecting results:
for i := 0; i < numIntsToGenerate; i++ {
singleThreadIntSlice[i] = <-ch
}
// ...
for i := 0; i < numIntsToGenerate; i++ {
multiThreadIntSlice[i] = <-ch
}
Ok. Code is now better. Attempting to run it, you will still experience that the multi-goroutine version runs slower. Why is that?
It's because controlling, synchronizing and collecting results from multiple goroutines does have overhead. If the task they perform is little, the communication overhead will be greater and overall you lose performance.
Your case is such a case. Generating a single random number once you set up your rand.Rand()
is pretty fast.
Let's modify your "task" to be big enough so that we can see the benefit of multiple goroutines:
// 1 million is enough now:
var numIntsToGenerate = 1000 * 1000
func makeRandomNumbers(numInts int, ch chan int) {
source := rand.NewSource(time.Now().UnixNano())
generator := rand.New(source)
for i := 0; i < numInts; i++ {
// Kill time, do some processing:
for j := 0; j < 1000; j++ {
generator.Intn(numInts * 100)
}
// and now return a single random number
ch <- generator.Intn(numInts * 100)
}
}
In this case to get a random number, we generate 1000 random numbers and just throw them away (to make some calculation / kill time) before we generate the one we return. We do this so that the calculation time of the worker goroutines outweights the communication overhead of multiple goroutines.
Running the app now, my results on a 4-core machine:
Initiating single-threaded random number generation.
Single-threaded run took 2.440604504s
Initiating multi-threaded random number generation.
Multi-threaded run took 987.946758ms
The multi-goroutine version runs 2.5 times faster. This means if your goroutines would deliver random numbers in 1000-blocks, you would see 2.5 times faster execution (compared to the single goroutine generation).
One last note:
Your single-goroutine version also uses multiple goroutines: 1 to generate numbers and 1 to collect the results. Most likely the collector does not fully utilize a CPU core and mostly just waits for the results, but still: 2 CPU cores are used. Let's estimate that "1.5" CPU cores are utilized. While the multi-goroutine version utilizes 4 CPU cores. Just as a rough estimation: 4 / 1.5 = 2.66, very close to our performance gain.
答案2
得分: 1
如果您真的想要并行生成随机数,那么每个任务应该是生成一组数字,然后一次性返回它们,而不是生成一个数字并将其逐个传递到通道中,因为在多个Go例程的情况下,读写通道会减慢速度。下面是修改后的代码,其中任务一次性生成所需的数字,在多个Go例程的情况下性能更好,我还使用了切片的切片来收集来自多个Go例程的结果。
package main
import "fmt"
import "time"
import "math/rand"
import "runtime"
func main() {
// 确定有多少个可用的CPU,并告诉Go使用所有的CPU
numThreads := runtime.NumCPU()
runtime.GOMAXPROCS(numThreads)
// 要生成的随机整数数量
var numIntsToGenerate = 100000000
// 每个生成goroutine线程要生成的整数数量
var numIntsPerThread = numIntsToGenerate / numThreads
// 用于从goroutine返回到主函数的通道
ch := make(chan []int)
fmt.Printf("正在启动单线程随机数生成。\n")
startSingleRun := time.Now()
// 从单个goroutine生成所有整数,从通道中检索预期数量的整数,并放入目标切片中
go makeRandomNumbers(numIntsToGenerate, ch)
singleThreadIntSlice := <-ch
elapsedSingleRun := time.Since(startSingleRun)
fmt.Printf("单线程运行时间:%s\n", elapsedSingleRun)
fmt.Printf("正在启动多线程随机数生成。\n")
multiThreadIntSlice := make([][]int, numThreads)
startMultiRun := time.Now()
// 运行指定数量的goroutine,每个goroutine生成其预期份额的总随机整数,从通道中检索预期数量的整数,并放入目标切片中
for i := 0; i < numThreads; i++ {
go makeRandomNumbers(numIntsPerThread, ch)
}
for i := 0; i < numThreads; i++ {
multiThreadIntSlice[i] = <-ch
}
elapsedMultiRun := time.Since(startMultiRun)
fmt.Printf("多线程运行时间:%s\n", elapsedMultiRun)
// 避免未使用的警告
fmt.Print(len(singleThreadIntSlice))
}
func makeRandomNumbers(numInts int, ch chan []int) {
source := rand.NewSource(time.Now().UnixNano())
generator := rand.New(source)
result := make([]int, numInts)
for i := 0; i < numInts; i++ {
result[i] = generator.Intn(numInts * 100)
}
ch <- result
}
英文:
If you really want to generate the random numbers in parallel then each task should be about generate the numbers and then return them in one go rather than the task being generate one number at a time and feed them to a channel as that reading and writing to channel will slow things down in multi go routine case. Below is the modified code where then task generate the required numbers in one go and this performs better in multi go routines case, also I have used slice of slices to collect the result from multi go routines.
package main
import "fmt"
import "time"
import "math/rand"
import "runtime"
func main() {
// Figure out how many CPUs are available and tell Go to use all of them
numThreads := runtime.NumCPU()
runtime.GOMAXPROCS(numThreads)
// Number of random ints to generate
var numIntsToGenerate = 100000000
// Number of ints to be generated by each spawned goroutine thread
var numIntsPerThread = numIntsToGenerate / numThreads
// Channel for communicating from goroutines back to main function
ch := make(chan []int)
fmt.Printf("Initiating single-threaded random number generation.\n")
startSingleRun := time.Now()
// Generate all of the ints from a single goroutine, retrieve the expected
// number of ints from the channel and put in target slice
go makeRandomNumbers(numIntsToGenerate, ch)
singleThreadIntSlice := <-ch
elapsedSingleRun := time.Since(startSingleRun)
fmt.Printf("Single-threaded run took %s\n", elapsedSingleRun)
fmt.Printf("Initiating multi-threaded random number generation.\n")
multiThreadIntSlice := make([][]int, numThreads)
startMultiRun := time.Now()
// Run the designated number of goroutines, each of which generates its
// expected share of the total random ints, retrieve the expected number
// of ints from the channel and put in target slice
for i := 0; i < numThreads; i++ {
go makeRandomNumbers(numIntsPerThread, ch)
}
for i := 0; i < numThreads; i++ {
multiThreadIntSlice[i] = <-ch
}
elapsedMultiRun := time.Since(startMultiRun)
fmt.Printf("Multi-threaded run took %s\n", elapsedMultiRun)
//To avoid not used warning
fmt.Print(len(singleThreadIntSlice))
}
func makeRandomNumbers(numInts int, ch chan []int) {
source := rand.NewSource(time.Now().UnixNano())
generator := rand.New(source)
result := make([]int, numInts)
for i := 0; i < numInts; i++ {
result[i] = generator.Intn(numInts * 100)
}
ch <- result
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论