英文:
Goroutine execution time with different input data
问题
我正在尝试使用goroutine来并行化一些计算。然而,goroutine的执行时间让我感到困惑。我的实验设置很简单。
runtime.GOMAXPROCS(3)
datalen := 1000000000
data21 := make([]float64, datalen)
data22 := make([]float64, datalen)
data23 := make([]float64, datalen)
t := time.Now()
res := make(chan interface{}, dlen)
go func() {
for i := 0; i < datalen; i++ {
data22[i] = math.Sqrt(13)
}
res <- true
}()
go func() {
for i := 0; i < datalen; i++ {
data22[i] = math.Sqrt(13)
}
res <- true
}()
go func() {
for i := 0; i < datalen; i++ {
data22[i] = math.Sqrt(13)
}
res <- true
}()
for i:=0; i<3; i++ {
<-res
}
fmt.Printf("并行循环运行时间为 %v。\n", time.Since(t))
请注意,我在3个goroutine中加载了相同的数据,这个程序的执行时间为:
并行循环运行时间为 7.436060182s。
然而,如果我让每个goroutine处理不同的数据,代码如下:
runtime.GOMAXPROCS(3)
datalen := 1000000000
data21 := make([]float64, datalen)
data22 := make([]float64, datalen)
data23 := make([]float64, datalen)
t := time.Now()
res := make(chan interface{}, dlen)
go func() {
for i := 0; i < datalen; i++ {
data21[i] = math.Sqrt(13)
}
res <- true
}()
go func() {
for i := 0; i < datalen; i++ {
data22[i] = math.Sqrt(13)
}
res <- true
}()
go func() {
for i := 0; i < datalen; i++ {
data23[i] = math.Sqrt(13)
}
res <- true
}()
for i:=0; i<3; i++ {
<-res
}
fmt.Printf("并行循环运行时间为 %v。\n", time.Since(t))
这种情况下的执行时间几乎是之前的3倍,并且几乎与顺序执行而没有使用goroutine的时间相等或更差。
并行循环运行时间为 20.744438468s。
我猜可能是我错误地使用了goroutine。那么正确使用多个goroutine处理不同数据片段的方法是什么呢?
英文:
I am experimenting with goroutine for parallelizing some computation. However, the execution time of goroutine confuse me. My experiment setup is simple.
runtime.GOMAXPROCS(3)
datalen := 1000000000
data21 := make([]float64, datalen)
data22 := make([]float64, datalen)
data23 := make([]float64, datalen)
t := time.Now()
res := make(chan interface{}, dlen)
go func() {
for i := 0; i < datalen; i++ {
data22[i] = math.Sqrt(13)
}
res <- true
}()
go func() {
for i := 0; i < datalen; i++ {
data22[i] = math.Sqrt(13)
}
res <- true
}()
go func() {
for i := 0; i < datalen; i++ {
data22[i] = math.Sqrt(13)
}
res <- true
}()
for i:=0; i<3; i++ {
<-res
}
fmt.Printf("The parallel for loop took %v to run.\n", time.Since(t))
Notice that I loaded the same data in 3 goroutines, the execution time for this program is
The parallel for loop took 7.436060182s to run.
However, if I let each goroutine handle different data as follows:
runtime.GOMAXPROCS(3)
datalen := 1000000000
data21 := make([]float64, datalen)
data22 := make([]float64, datalen)
data23 := make([]float64, datalen)
t := time.Now()
res := make(chan interface{}, dlen)
go func() {
for i := 0; i < datalen; i++ {
data21[i] = math.Sqrt(13)
}
res <- true
}()
go func() {
for i := 0; i < datalen; i++ {
data22[i] = math.Sqrt(13)
}
res <- true
}()
go func() {
for i := 0; i < datalen; i++ {
data23[i] = math.Sqrt(13)
}
res <- true
}()
for i:=0; i<3; i++ {
<-res
}
fmt.Printf("The parallel for loop took %v to run.\n", time.Since(t))
The execution time for this is almost 3 times more than previous and is almost equal/worse then sequential execution without goroutine
The parallel for loop took 20.744438468s to run.
I guess maybe I use the goroutine in a wrong way. So what should be the correct way to use multiple goroutines to handle different pieces of data;
答案1
得分: 3
由于您的示例程序没有执行任何实质性的计算,瓶颈将是数据写入内存的速度。根据示例中的设置,我们需要写入约22 GB的数据,这并不是微不足道的。
考虑到两个示例运行时间的差异,一个可能的原因是实际上并没有写入那么多数据到RAM中。由于内存写入被CPU缓存,执行过程可能如下所示:
- 第一个goroutine将数据写入表示
data22
数组开头的缓存行。 - 第二个goroutine将数据写入表示相同位置的缓存行。运行第一个goroutine的CPU注意到写入使其自己的缓存写入无效,因此丢弃了更改。
- 第三个goroutine将数据写入表示相同位置的缓存行。运行第二个goroutine的CPU注意到写入使其自己的缓存写入无效,因此丢弃了更改。
- 第三个CPU中的缓存行被驱逐,并将更改写入RAM。
随着goroutine在data22
数组中的进行,这个过程将继续进行。由于RAM是瓶颈,并且在这种情况下我们最终只写入了三分之一的数据,所以它以大约3倍于第二种情况的速度运行并不令人意外。
英文:
Since your example program is not performing any substantial calculation, the bottleneck is going to be the speed at which data can be written to memory. With the settings in the example, we're talking about 22 GB of writes which is not insignificant.
Given the time difference in the run time of the two examples, one likely possibility is that it isn't actually writing as much to the RAM. Given that memory writes are cached by the CPU, the execution probably looks something like this:
- the first goroutine writes out data to a cache line representing the start of the
data22
array. - the second goroutine writes out data to a cache line representing the same location. The CPU running the first goroutine notices that the write invalidates its own cached write, so throws away its changes.
- the third goroutine writes out data to a cache line representing the same location. The CPU running the second goroutine notices that the write invalidates its own cached write, so throws away its changes.
- the cache line in the third CPU is evicted and the changes are written out to RAM.
This process continues as the goroutines progress through the data22
array. Since RAM is the bottleneck and we end up writing one third as much data in this scenario, it isn't that surprising that it runs approximately 3 times as fast as the second case.
答案2
得分: 1
你正在使用大量的内存。在第一个示例中,1000000000 * 8 = 8GB,在第二个示例中,3 * 1000000000 * 8 = 24GB。在第二个示例中,你可能使用了大量的交换空间。磁盘I/O非常非常慢,即使在SSD上也是如此。
将 datalen := 1000000000 更改为 datalen := 100000000,减少10倍。现在你的运行时间是多少?每个示例至少运行三次并求平均值。你的计算机有多少内存?你正在使用SSD吗?
英文:
You are using enormous amounts of memory. 1000000000 * 8 = 8GB in the first example and 3 * 1000000000 * 8 = 24GB in the second example. In the second example you are probably using lots of swap space. Disk I/O is very, very slow, even on an SSD.
Change datalen := 1000000000 to datalen := 100000000, a 10-fold decrease. What are your run times now? Average at least three runs of each example. How much memory does your computer have? Are you using an SSD?
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论