2014年4月4日 06:29:03go评论93阅读模式

英文:

Merge sort with goroutines vs normal Mergesort

问题

我在Go语言中编写了两个版本的归并排序。一个版本使用了goroutines，另一个版本没有使用。我正在比较它们的性能，并且我一直看到这个链接：

https://github.com/denniss/goplayground/blob/master/src/example/sort.go#L69

这是使用了goroutines的版本。而这是没有使用的版本：

https://github.com/denniss/goplayground/blob/master/src/example/sort.go#L8

我一直在试图弄清楚为什么使用goroutines的实现比没有使用的实现性能要差得多。这是我在本地看到的数字：

 go run src/main.go
[5 4 3 2 1]
普通归并排序
[1 2 3 4 5]
时间：724
使用goroutines的归并排序
[1 2 3 4 5]
时间：26690

然而，我仍然没有找到原因。想知道你们是否能给我一些建议或者想法，告诉我该做些什么或者看些什么。在我看来，使用goroutines的实现应该至少在某种程度上更好。我这样说主要是因为以下几行代码：

go MergeSortAsync(numbers[0:m], lchan)
go MergeSortAsync(numbers[m:l], rchan)

英文:

I am wrote two versions of mergesort in Go. One with goroutines and the other one without. I am comparing the performance of each and I keep seeing

https://github.com/denniss/goplayground/blob/master/src/example/sort.go#L69

That's the one using goroutines. And this is the one without

https://github.com/denniss/goplayground/blob/master/src/example/sort.go#L8

I have been trying to figure out why the goroutine implementation performs much worse than the one without. This is the number that I see locally

 go run src/main.go
[5 4 3 2 1]
Normal Mergesort
[1 2 3 4 5]
Time:  724
Mergesort with Goroutines
[1 2 3 4 5]
Time:  26690

Yet I still have not been able to figure out why. Wondering if you guys can give me suggestions or ideas on what to do/look at. It seems to me that the implementation with goroutines should perform at least somewhat better. I say so mainly, because of the following lines

go MergeSortAsync(numbers[0:m], lchan)
go MergeSortAsync(numbers[m:l], rchan)

答案1

得分: 2

使用并发并不一定能使算法运行更快。实际上，除非算法本身是并行的，否则并发可能会减慢执行速度。

处理器（CPU）一次只能执行一项任务，即使对我们来说，它似乎同时执行两个任务。两个 goroutine 的指令可能会交错执行，但这并不会比单个 goroutine 运行更快。在任何给定的时刻，只有一个 goroutine 的一条指令正在被执行（根据硬件特性，有一些非常低级的例外情况）--除非你的程序在多个核心上运行。

据我所知，标准的归并排序算法并不是天然并行的；需要进行一些修改来优化它以在多个处理器上并行执行。即使你使用多个处理器，算法也需要进行优化。

这些优化通常涉及到通道的使用。我不同意“写入通道的开销很大”（Go 使其非常高效），然而，它确实引入了一个可能导致 goroutine 阻塞的可能性。实际上，写入通道本身并不会显著减慢程序的速度，瓶颈可能在于调度/同步：等待和唤醒 goroutine 进行通道的读写操作。

补充 Not_a_Golfer 的回答，我同意 goroutine 在执行 I/O 操作时表现出色--即使在单个核心上--因为这些操作远离 CPU。当一个 goroutine 在等待 I/O 时，调度器可以在此期间调度另一个与 CPU 绑定的 goroutine 运行。然而，当部署在多个处理器/核心上时，goroutine 也适用于 CPU 密集型操作。

英文:

Using concurrency does not necessarily make an algorithm run faster. In fact, unless the algorithm is inherently parallel, it can slow down the execution.

A processor (CPU) can only do one thing at a time even if, to us, it seems to be doing two things at once. The instructions of two goroutines may be interleaved, but that does not make them run any faster than a single goroutine. A single instruction from only one of the goroutines is ever being executed at any given moment (there are some very low-level exceptions to this, depending on hardware features) -- unless your program is running on more than one core.

As far as I know, the standard merge sort algorithm isn't inherently parallel; some modifications need to be made to optimize it for parallel execution on multiple processors. Even if you're using multiple processors, the algorithm needs to be optimized for it.

These optimizations usually relate to the use of channels. I wouldn't agree that "writing to channels has a big overhead" of itself (Go makes it very performant), however, it does introduce the likely possibility that a goroutine will block. It's not the actual writing to a channel that slows down the program significantly, it's scheduling/synchronizing: the waiting and waking of either goroutine to write or read from the channel is probably the bottleneck.

To complement Not_a_Golfer's answer, I will agree that goroutines definitely shine when executing I/O operations--even on a single core--since these occur far away from the CPU. While one goroutine is waiting on I/O, the scheduler can dispatch another CPU-bound goroutine to run in the meantime. However, goroutines also shine for CPU-intensive operations when deployed across multiple processors/cores.

答案2

得分: 1

正如其他人所解释的，存在并行性的成本。你需要看到足够的好处来弥补这个成本。只有当工作单元大于创建通道和接收结果的 goroutine 的成本时，才会发生这种情况。

你可以进行实验来确定工作单元应该是什么。假设工作单元是对1000个元素进行排序。在这种情况下，你可以很容易地修改你的代码，如下所示：

func MergeSortAsync(numbers []int, resultChan chan []int) {
    l := len(numbers)
    if l <= 1000 {
        resultChan <- Mergesort(numbers)
        return
    }

换句话说，一旦工作单元太小，无法证明使用 goroutine 和通道的成本，就使用不带这些成本的简单的 Mergesort。

英文:

As others have explained, there is a cost of parallelism. You need to see enough benefit to compensate for that cost. This only happens when the unit of work is larger than the cost of making channels and goroutines receiving the results.

You can experiment to determine what the unit of work should be. Suppose the unit of work is sorting 1000 elements. In that case, you can change your code very easily like so:

func MergeSortAsync(numbers [] int, resultChan chan []int)  {
    l := len(numbers)
    if l &lt;= 1000 {
            resultChan &lt;- Mergesort(numbers)
            return
    }

In other words, once the unit of work is too small to justify using goroutines and channels, use your simple Mergesort without those costs.

答案3

得分: 0

有两个主要原因：

向通道写入数据会带来很大的开销。举个例子，我尝试使用通道和 goroutine 作为迭代器，但它们的速度比重复调用方法慢了大约100倍。当然，如果通过通道传输的操作需要很长时间（比如爬取网页），那么这种差异就可以忽略不计。
Goroutine 在基于 IO 的并发方面表现得非常出色，而在 CPU 并行方面则稍逊一筹。

考虑到这两个问题，要想提高速度，你需要拥有大量的 CPU 或者每个 goroutine 执行更长、不那么阻塞的操作。

英文:

There are two main reasons:

Writing to channels has a big overhead. Just as a reference - I tried using channels and goroutines as iterators. They were ~100 times slower than calling methods repeatedly. Of course if the operation that is being piped via the channel takes a long time to perform (say crawling a web page), that difference is negligible.
Goroutines really shine for IO based concurrency, and less so for CPU parallelism.

Considering these 2 issues, you'll need a lot of CPUs or longer, less blocking operations per goroutine, to make this faster.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用goroutines的归并排序与普通的归并排序相比。

问题

答案1

答案2

答案3

使用pgx和bun（Golang）与内存数据库

如何将元素从一个切片移动到另一个切片？

在Golang中实现一条命令完成MySQL的导出和恢复操作。

无法获取数据库响应以执行进一步的Go更新查询。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论