英文:
Merge sort with goroutines vs normal Mergesort
问题
我在Go语言中编写了两个版本的归并排序。一个版本使用了goroutines,另一个版本没有使用。我正在比较它们的性能,并且我一直看到这个链接:
https://github.com/denniss/goplayground/blob/master/src/example/sort.go#L69
这是使用了goroutines的版本。而这是没有使用的版本:
https://github.com/denniss/goplayground/blob/master/src/example/sort.go#L8
我一直在试图弄清楚为什么使用goroutines的实现比没有使用的实现性能要差得多。这是我在本地看到的数字:
go run src/main.go
[5 4 3 2 1]
普通归并排序
[1 2 3 4 5]
时间:724
使用goroutines的归并排序
[1 2 3 4 5]
时间:26690
然而,我仍然没有找到原因。想知道你们是否能给我一些建议或者想法,告诉我该做些什么或者看些什么。在我看来,使用goroutines的实现应该至少在某种程度上更好。我这样说主要是因为以下几行代码:
go MergeSortAsync(numbers[0:m], lchan)
go MergeSortAsync(numbers[m:l], rchan)
英文:
I am wrote two versions of mergesort in Go. One with goroutines and the other one without. I am comparing the performance of each and I keep seeing
https://github.com/denniss/goplayground/blob/master/src/example/sort.go#L69
That's the one using goroutines. And this is the one without
https://github.com/denniss/goplayground/blob/master/src/example/sort.go#L8
I have been trying to figure out why the goroutine implementation performs much worse than the one without. This is the number that I see locally
go run src/main.go
[5 4 3 2 1]
Normal Mergesort
[1 2 3 4 5]
Time: 724
Mergesort with Goroutines
[1 2 3 4 5]
Time: 26690
Yet I still have not been able to figure out why. Wondering if you guys can give me suggestions or ideas on what to do/look at. It seems to me that the implementation with goroutines should perform at least somewhat better. I say so mainly, because of the following lines
go MergeSortAsync(numbers[0:m], lchan)
go MergeSortAsync(numbers[m:l], rchan)
答案1
得分: 2
使用并发并不一定能使算法运行更快。实际上,除非算法本身是并行的,否则并发可能会减慢执行速度。
处理器(CPU)一次只能执行一项任务,即使对我们来说,它似乎同时执行两个任务。两个 goroutine 的指令可能会交错执行,但这并不会比单个 goroutine 运行更快。在任何给定的时刻,只有一个 goroutine 的一条指令正在被执行(根据硬件特性,有一些非常低级的例外情况)--除非你的程序在多个核心上运行。
据我所知,标准的归并排序算法并不是天然并行的;需要进行一些修改来优化它以在多个处理器上并行执行。即使你使用多个处理器,算法也需要进行优化。
这些优化通常涉及到通道的使用。我不同意“写入通道的开销很大”(Go 使其非常高效),然而,它确实引入了一个可能导致 goroutine 阻塞的可能性。实际上,写入通道本身并不会显著减慢程序的速度,瓶颈可能在于调度/同步:等待和唤醒 goroutine 进行通道的读写操作。
补充 Not_a_Golfer 的回答,我同意 goroutine 在执行 I/O 操作时表现出色--即使在单个核心上--因为这些操作远离 CPU。当一个 goroutine 在等待 I/O 时,调度器可以在此期间调度另一个与 CPU 绑定的 goroutine 运行。然而,当部署在多个处理器/核心上时,goroutine 也适用于 CPU 密集型操作。
英文:
Using concurrency does not necessarily make an algorithm run faster. In fact, unless the algorithm is inherently parallel, it can slow down the execution.
A processor (CPU) can only do one thing at a time even if, to us, it seems to be doing two things at once. The instructions of two goroutines may be interleaved, but that does not make them run any faster than a single goroutine. A single instruction from only one of the goroutines is ever being executed at any given moment (there are some very low-level exceptions to this, depending on hardware features) -- unless your program is running on more than one core.
As far as I know, the standard merge sort algorithm isn't inherently parallel; some modifications need to be made to optimize it for parallel execution on multiple processors. Even if you're using multiple processors, the algorithm needs to be optimized for it.
These optimizations usually relate to the use of channels. I wouldn't agree that "writing to channels has a big overhead" of itself (Go makes it very performant), however, it does introduce the likely possibility that a goroutine will block. It's not the actual writing to a channel that slows down the program significantly, it's scheduling/synchronizing: the waiting and waking of either goroutine to write or read from the channel is probably the bottleneck.
To complement Not_a_Golfer's answer, I will agree that goroutines definitely shine when executing I/O operations--even on a single core--since these occur far away from the CPU. While one goroutine is waiting on I/O, the scheduler can dispatch another CPU-bound goroutine to run in the meantime. However, goroutines also shine for CPU-intensive operations when deployed across multiple processors/cores.
答案2
得分: 1
正如其他人所解释的,存在并行性的成本。你需要看到足够的好处来弥补这个成本。只有当工作单元大于创建通道和接收结果的 goroutine 的成本时,才会发生这种情况。
你可以进行实验来确定工作单元应该是什么。假设工作单元是对1000个元素进行排序。在这种情况下,你可以很容易地修改你的代码,如下所示:
func MergeSortAsync(numbers []int, resultChan chan []int) {
l := len(numbers)
if l <= 1000 {
resultChan <- Mergesort(numbers)
return
}
换句话说,一旦工作单元太小,无法证明使用 goroutine 和通道的成本,就使用不带这些成本的简单的 Mergesort
。
英文:
As others have explained, there is a cost of parallelism. You need to see enough benefit to compensate for that cost. This only happens when the unit of work is larger than the cost of making channels and goroutines receiving the results.
You can experiment to determine what the unit of work should be. Suppose the unit of work is sorting 1000 elements. In that case, you can change your code very easily like so:
func MergeSortAsync(numbers [] int, resultChan chan []int) {
l := len(numbers)
if l <= 1000 {
resultChan <- Mergesort(numbers)
return
}
In other words, once the unit of work is too small to justify using goroutines and channels, use your simple Mergesort
without those costs.
答案3
得分: 0
有两个主要原因:
-
向通道写入数据会带来很大的开销。举个例子,我尝试使用通道和 goroutine 作为迭代器,但它们的速度比重复调用方法慢了大约100倍。当然,如果通过通道传输的操作需要很长时间(比如爬取网页),那么这种差异就可以忽略不计。
-
Goroutine 在基于 IO 的并发方面表现得非常出色,而在 CPU 并行方面则稍逊一筹。
考虑到这两个问题,要想提高速度,你需要拥有大量的 CPU 或者每个 goroutine 执行更长、不那么阻塞的操作。
英文:
There are two main reasons:
-
Writing to channels has a big overhead. Just as a reference - I tried using channels and goroutines as iterators. They were ~100 times slower than calling methods repeatedly. Of course if the operation that is being piped via the channel takes a long time to perform (say crawling a web page), that difference is negligible.
-
Goroutines really shine for IO based concurrency, and less so for CPU parallelism.
Considering these 2 issues, you'll need a lot of CPUs or longer, less blocking operations per goroutine, to make this faster.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论