英文:
How can my Go program keep all the CPU cores busy?
问题
Goroutines是轻量级进程,由Go运行时自动分配到一个或多个操作系统线程上进行时间片切换。(这是Go的一个非常酷的特性!)
假设我有一个并发应用程序,比如一个Web服务器。在我的假设程序中,有很多并发的事情发生,几乎没有非并发(阿姆达尔定律)的比例。
看起来,当前使用的操作系统线程的默认数量是1。这是否意味着只有一个CPU核心被使用?
如果我使用以下方式启动我的程序:
runtime.GOMAXPROCS(runtime.NumCPU())
这样会合理高效地使用我PC上的所有核心吗?
通过使用一些启发式方法,例如
runtime.GOMAXPROCS(runtime.NumCPU() * 2)
是否有任何“并行松弛度”的好处?
英文:
Goroutines are light-weight processes that are automatically time-sliced onto one or more operating system threads by the Go runtime. (This is a really cool feature of Go!)
Suppose I have a concurrent application like a webserver. There is plenty of stuff happening concurrently in my hypothetical program, without much non-concurrent (Amdahl's Law) ratio.
It seems that the default number of operating system threads in use is currently 1. Does this mean that only one CPU core gets used?
If I start my program with
runtime.GOMAXPROCS(runtime.NumCPU())
will that give reasonably efficient use of all the cores on my PC?
Is there any "parallel slackness" benefit from having even more OS threads in use, e.g. via some heuristic
runtime.GOMAXPROCS(runtime.NumCPU() * 2)
?
答案1
得分: 78
从Go FAQ中:
> 为什么我的多goroutine程序没有使用多个CPU?
>
> 您必须设置GOMAXPROCS shell环境变量或使用runtime包中的同名函数,以允许运行时支持利用多个操作系统线程。
>
> 执行并行计算的程序应该从增加GOMAXPROCS中受益。但是,请注意并发不等于并行。
(更新于2015年8月28日:Go 1.5将默认值GOMAXPROCS设置为与您的计算机上的CPU数量相同,因此这不应再是一个问题)
并且
> 为什么使用GOMAXPROCS > 1有时会使我的程序变慢?
>
> 这取决于您的程序的性质。本质上是顺序的问题无法通过添加更多的goroutine来加速。只有在问题本质上是并行的情况下,并发才会变成并行。
>
> 在实际情况中,如果程序在通道上花费的时间比计算时间更多,那么在使用多个操作系统线程时会出现性能下降。这是因为在线程之间发送数据涉及上下文切换,这具有显着的成本。例如,Go规范中的素数筛示例没有明显的并行性,尽管它启动了许多goroutine;增加GOMAXPROCS更有可能使其变慢而不是加快速度。
>
> Go的goroutine调度器还不够好。将来,它应该能够识别这种情况并优化其对操作系统线程的使用。目前,GOMAXPROCS应该根据每个应用程序的情况进行设置。
简而言之:让Go使用“所有核心的高效利用”非常困难。仅仅生成十亿个goroutine并增加GOMAXPROCS与其加速相比,更有可能降低性能,因为它将一直在切换线程上下文。如果您有一个可以并行化的大型程序,那么将GOMAXPROCS增加到并行组件的数量是可以的。如果您在一个主要非并行程序中嵌入了一个并行问题,它可能会加速,或者您可能需要创造性地使用像runtime.LockOSThread()这样的函数来确保运行时正确分配所有内容(一般来说,Go只是在所有活动线程中随机均匀地分配当前非阻塞的Goroutine)。
此外,GOMAXPROCS是要使用的CPU核心数,如果大于NumCPU,我相当确定它会被限制为NumCPU。GOMAXPROCS并不严格等于线程数。我不100%确定运行时何时决定生成新线程,但一个实例是当使用runtime.LockOSThread()的阻塞goroutine数量大于或等于GOMAXPROCS时 - 它将生成比核心更多的线程,以便可以保持程序的其余部分正常运行。
基本上,增加GOMAXPROCS并使go 使用 CPU的所有核心非常简单。但是,在Go的发展阶段,实际上让它智能高效地使用 CPU的所有核心则是另一回事,需要大量的程序设计和调整才能做到。
英文:
From the Go FAQ:
> Why doesn't my multi-goroutine program use multiple CPUs?
>
> You must set the GOMAXPROCS shell environment variable or use the similarly-named function of the runtime package to allow the run-time support to utilize more than one OS thread.
>
> Programs that perform parallel computation should benefit from an increase in GOMAXPROCS. However, be aware that concurrency is not parallelism.
(UPDATE 8/28/2015: Go 1.5 is set to make the default value of GOMAXPROCS the same as the number of CPUs on your machine, so this shouldn't be a problem anymore)
And
> Why does using GOMAXPROCS > 1 sometimes make my program slower?
>
> It depends on the nature of your program. Problems that are intrinsically sequential cannot be sped up by adding more goroutines. Concurrency only becomes parallelism when the problem is intrinsically parallel.
>
> In practical terms, programs that spend more time communicating on channels than doing computation will experience performance degradation when using multiple OS threads. This is because sending data between threads involves switching contexts, which has significant cost. For instance, the prime sieve example from the Go specification has no significant parallelism although it launches many goroutines; increasing GOMAXPROCS is more likely to slow it down than to speed it up.
>
> Go's goroutine scheduler is not as good as it needs to be. In future, it should recognize such cases and optimize its use of OS threads. For now, GOMAXPROCS should be set on a per-application basis.
In short: it is very difficult to make Go use "efficient use of all your cores". Simply spawning a billion goroutines and increasing GOMAXPROCS is just as likely to degrade your performance as speed it up because it will be switching thread contexts all the time. If you have a large program that is parallelizable, then increasing GOMAXPROCS to the number of parallel components works fine. If you have a parallel problem embedded in a largely non-parallel program, it may speed up, or you may have to make creative use of functions like runtime.LockOSThread() to ensure the runtime distributes everything correctly (generally speaking Go just dumbly spreads currently non-blocking Goroutines haphazardly and evenly among all active threads).
Also, GOMAXPROCS is the number of CPU cores to use, if it's greater than NumCPU I'm fairly sure that it simply clamps to NumCPU. GOMAXPROCS isn't strictly equal to the number of threads. I'm not 100% sure of exactly when the runtime decides to spawn new threads, but one instance is when the number of blocking goroutines using runtime.LockOSThread() is greater than or equal to GOMAXPROCs -- it will spawn more threads than cores so it can keep the rest of the program running sanely.
Basically, it's quite simple to increase GOMAXPROCS and make go use all cores of your CPU. It's quite another thing at this point in Go's development to actually get it to smartly and efficiently use all cores of your CPU, requiring a lot of program design and finagling to get right.
答案2
得分: 6
这个问题无法回答,它太过宽泛。
拿出你的问题、算法和工作负载,衡量对于这个组合来说什么是最好的。
没有人能回答像“如果我在午餐中加入两倍的盐,它会更好吃吗?”这样的问题,因为这取决于午餐的种类(番茄比草莓更需要盐)、你的口味以及已经加了多少盐。试一试吧。
另外,runtime.GOMAXPROCS(runtime.NumCPU())
已经成为了一种崇拜的状态,但是通过设置 GOMAXPROCS 环境变量来控制线程数量可能是更好的选择。
英文:
This question cannot be answered, it is much too broad.
Take your problem, your algorithm and your workload and measure what is best for this combination.
Nobody can answer a question like "Is there any heuristic that adding twice as much salt to my lunch will make it taste better?" as this depends on the lunch (tomatoes benefit much more from salt than strawberries) your taste and how much salt there is already. Try it.
On more: runtime.GOMAXPROCS(runtime.NumCPU())
has achieved cult status but controlling the number of threads by setting the GOMAXPROCS environment variable from the outside might be the much better option.
答案3
得分: 3
runtime.GOMAXPROCS()
设置了你的程序可以同时使用的(虚拟)CPU核心数。允许Go使用比实际拥有的CPU核心更多的核心是没有帮助的,因为你的系统只有有限的CPU核心数。
为了在多个线程中运行,你的程序必须有多个goroutine,通常是使用go someFunc()
进行函数调用。如果你的程序没有启动任何额外的goroutine,无论你允许它使用多少个CPU/核心,它都只会在一个线程中运行。
查看这个链接和后续的练习,了解如何创建goroutine。
英文:
runtime.GOMAXPROCS()
sets the number of (virtual) CPU cores that your program can use simultaneously. Allowing Go to use more CPU cores than you actually have won't help, as your system only has so many CPU cores.
In order to run in more than one thread, your program has to have several goroutines, typically function calls with go someFunc()
. If your program doesn't start any additional goroutines it will naturally run in only one thread no matter how many CPUs/cores you allow it to use.
Check out this and the following exercises on how to create goroutines.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论