英文:
Max number of goroutines
问题
我可以无痛地使用多少个goroutine?例如维基百科上说,在Erlang中可以创建2000万个进程而不会降低性能。
**更新:**我刚刚稍微调查了一下goroutine的性能,得到了以下结果:
-
看起来goroutine的生命周期比计算sqrt() 1000次(大约45微秒对我来说)更长,唯一的限制是内存
-
Goroutine的成本为4-4.5 KB
英文:
How many goroutines can I use painless? For example wikipedia says, in Erlang 20 million processes can be created without degrading performance.
Update: I've just investigated in goroutines performance a little and got such a results:
- It looks like goroutine lifetime is more then calculating sqrt() 1000 times ( ~45µs for me ), the only limitation is memory
- Goroutine costs 4 — 4.5 KB
答案1
得分: 95
如果一个goroutine被阻塞,除了以下成本之外没有其他成本:
- 内存使用
- 垃圾回收速度变慢
这些成本(以内存和实际开始执行goroutine的平均时间为单位)是:
Go 1.6.2(2016年4月)
32位x86 CPU(A10-7850K 4GHz)
| goroutine数量:100000
| 每个goroutine:
| 内存:4536.84字节
| 时间:1.634248微秒
64位x86 CPU(A10-7850K 4GHz)
| goroutine数量:100000
| 每个goroutine:
| 内存:4707.92字节
| 时间:1.842097微秒
Go release.r60.3(2011年12月)
32位x86 CPU(1.6 GHz)
| goroutine数量:100000
| 每个goroutine:
| 内存:4243.45字节
| 时间:5.815950微秒
在安装了4GB内存的机器上,这将限制goroutine的最大数量略小于100万。
源代码(如果您已经理解上面打印的数字,则无需阅读此内容):
package main
import (
"flag"
"fmt"
"os"
"runtime"
"time"
)
var n = flag.Int("n", 1e5, "要创建的goroutine数量")
var ch = make(chan byte)
var counter = 0
func f() {
counter++
<-ch // 阻塞此goroutine
}
func main() {
flag.Parse()
if *n <= 0 {
fmt.Fprintf(os.Stderr, "无效的goroutine数量")
os.Exit(1)
}
// 将多余的操作系统线程数量限制为1
runtime.GOMAXPROCS(1)
// 复制MemStats
var m0 runtime.MemStats
runtime.ReadMemStats(&m0)
t0 := time.Now().UnixNano()
for i := 0; i < *n; i++ {
go f()
}
runtime.Gosched()
t1 := time.Now().UnixNano()
runtime.GC()
// 复制MemStats
var m1 runtime.MemStats
runtime.ReadMemStats(&m1)
if counter != *n {
fmt.Fprintf(os.Stderr, "无法开始执行所有goroutine")
os.Exit(1)
}
fmt.Printf("goroutine数量:%d\n", *n)
fmt.Printf("每个goroutine:\n")
fmt.Printf(" 内存:%.2f字节\n", float64(m1.Sys-m0.Sys)/float64(*n))
fmt.Printf(" 时间:%f微秒\n", float64(t1-t0)/float64(*n)/1e3)
}
英文:
If a goroutine is blocked, there is no cost involved other than:
- memory usage
- slower garbage-collection
The costs (in terms of memory and average time to actually start executing a goroutine) are:
Go 1.6.2 (April 2016)
32-bit x86 CPU (A10-7850K 4GHz)
| Number of goroutines: 100000
| Per goroutine:
| Memory: 4536.84 bytes
| Time: 1.634248 µs
64-bit x86 CPU (A10-7850K 4GHz)
| Number of goroutines: 100000
| Per goroutine:
| Memory: 4707.92 bytes
| Time: 1.842097 µs
Go release.r60.3 (December 2011)
32-bit x86 CPU (1.6 GHz)
| Number of goroutines: 100000
| Per goroutine:
| Memory: 4243.45 bytes
| Time: 5.815950 µs
On a machine with 4 GB of memory installed, this limits the maximum number of goroutines to slightly less than 1 million.
Source code (no need to read this if you already understand the numbers printed above):
package main
import (
"flag"
"fmt"
"os"
"runtime"
"time"
)
var n = flag.Int("n", 1e5, "Number of goroutines to create")
var ch = make(chan byte)
var counter = 0
func f() {
counter++
<-ch // Block this goroutine
}
func main() {
flag.Parse()
if *n <= 0 {
fmt.Fprintf(os.Stderr, "invalid number of goroutines")
os.Exit(1)
}
// Limit the number of spare OS threads to just 1
runtime.GOMAXPROCS(1)
// Make a copy of MemStats
var m0 runtime.MemStats
runtime.ReadMemStats(&m0)
t0 := time.Now().UnixNano()
for i := 0; i < *n; i++ {
go f()
}
runtime.Gosched()
t1 := time.Now().UnixNano()
runtime.GC()
// Make a copy of MemStats
var m1 runtime.MemStats
runtime.ReadMemStats(&m1)
if counter != *n {
fmt.Fprintf(os.Stderr, "failed to begin execution of all goroutines")
os.Exit(1)
}
fmt.Printf("Number of goroutines: %d\n", *n)
fmt.Printf("Per goroutine:\n")
fmt.Printf(" Memory: %.2f bytes\n", float64(m1.Sys-m0.Sys)/float64(*n))
fmt.Printf(" Time: %f µs\n", float64(t1-t0)/float64(*n)/1e3)
}
答案2
得分: 32
根据Go FAQ中的为什么使用goroutines而不是线程?的解释,可以在同一地址空间中创建数十万个goroutines。
测试test/chan/goroutines.go创建了10,000个goroutines,并且可以轻松地创建更多,但它被设计成快速运行;您可以在您的系统上更改数字进行实验。在具备足够内存的情况下,例如在服务器上,您可以轻松运行数百万个goroutines。
要了解goroutines的最大数量,请注意每个goroutine的成本主要是堆栈。根据FAQ中的解释:
…goroutines非常廉价:除了用于堆栈的内存外,它们几乎没有额外开销,堆栈只需要几千字节。
一个简单的估算是假设每个goroutine分配了一个4 KiB的页面作为堆栈(4 KiB是一个相对统一的大小),再加上一些用于运行时的控制块的小开销;这与您观察到的情况相符(在2011年,Go 1.0之前)。因此,100,000个goroutines大约需要400 MiB的内存,而1,000,000个goroutines大约需要4 GiB的内存,在桌面上仍然可以管理,对于手机来说有点多,但在服务器上非常容易管理。实际上,起始堆栈的大小从半个页面(2 KiB)到两个页面(8 KiB)不等,因此这个估算是近似正确的。
起始堆栈大小随时间而变化;它从4 KiB(一个页面)开始,然后在1.2版本中增加到8 KiB(2个页面),然后在1.4版本中减少到2 KiB(半个页面)。这些变化是由于分段堆栈在快速切换段之间(“热堆栈分割”)时导致性能问题,因此在1.2版本中增加以缓解问题,然后在分段堆栈被连续堆栈替换时减少(1.4版本):
Go 1.2版本发布说明:堆栈大小:
在Go 1.2中,当创建goroutine时,堆栈的最小大小从4KB增加到8KB。
Go 1.4版本发布说明:运行时的变化:
在1.4版本中,goroutine的默认起始堆栈大小从8192字节减少到2048字节。
每个goroutine的内存主要是堆栈,并且它从较低的位置开始增长,因此您可以廉价地拥有许多goroutines。您可以使用较小的起始堆栈,但这样它将更早增长(以时间为代价获得空间),并且由于控制块不会缩小,好处会减少。可以消除堆栈,至少在交换出时(例如,在堆上执行所有分配,或在上下文切换时将堆栈保存到堆上),但这会影响性能并增加复杂性。这是可能的(如Erlang中所示),并且意味着您只需要控制块和保存的上下文,从而可以增加5倍到10倍的goroutine数量,现在受控制块大小和goroutine本地变量在堆上的大小限制。然而,除非您需要数百万个微小的休眠goroutines,否则这并不是非常有用。
由于拥有许多goroutines的主要用途是用于IO密集型任务(具体来说是处理阻塞的系统调用,特别是网络或文件系统IO),您更有可能遇到其他资源的操作系统限制,即网络套接字或文件句柄:golang-nuts › goroutines和文件描述符的最大数量?。解决这个问题的常见方法是使用稀缺资源的池,或者更简单地通过信号量限制数量;请参阅在Go中节省文件描述符和在Go中限制并发性。
英文:
Hundreds of thousands, per Go FAQ: Why goroutines instead of threads?:
> It is practical to create hundreds of thousands of goroutines in the same address space.
The test test/chan/goroutines.go creates 10,000 and could easily do more, but is designed to run quickly; you can change the number on your system to experiment. You can easily run millions, given enough memory, such as on a server.
To understand the max number of goroutines, note that the per-goroutine cost is primarily the stack. Per FAQ again:
> …goroutines, can be very cheap: they have little overhead beyond the memory for the stack, which is just a few kilobytes.
A back-of-the-envelop calculation is to assume that each goroutine has one 4 KiB page allocated for the stack (4 KiB is a pretty uniform size), plus some small overhead for a control block (like a Thread Control Block) for the runtime; this agrees with what you observed (in 2011, pre-Go 1.0). Thus 100 Ki routines would take about 400 MiB of memory, and 1 Mi routines would take about 4 GiB of memory, which is still manageable on desktop, a bit much for a phone, and very manageable on a server. In practice the starting stack has ranged in size from half a page (2 KiB) to two pages (8 KiB), so this is approximately correct.
The starting stack size has changed over time; it started at 4 KiB (one page), then in 1.2 was increased to 8 KiB (2 pages), then in 1.4 was decreased to 2 KiB (half a page). These changes were due to segmented stacks causing performance problems when rapidly switching back and forth between segments ("hot stack split"), so increased to mitigate (1.2), then decreased when segmented stacks were replaced with contiguous stacks (1.4):
Go 1.2 Release Notes: Stack size:
> In Go 1.2, the minimum size of the stack when a goroutine is created has been lifted from 4KB to 8KB
Go 1.4 Release Notes: Changes to the runtime:
> the default starting size for a goroutine's stack in 1.4 has been reduced from 8192 bytes to 2048 bytes.
Per-goroutine memory is largely stack, and it starts low and grows so you can cheaply have many goroutines. You could use a smaller starting stack, but then it would have to grow sooner (gain space at cost of time), and the benefits decrease due to the control block not shrinking. It is possible to eliminate the stack, at least when swapped out (e.g., do all allocation on heap, or save stack to heap on context switch), though this hurts performance and adds complexity. This is possible (as in Erlang), and means you’d just need the control block and saved context, allowing another factor of 5×–10× in number of goroutines, limited now by control block size and on-heap size of goroutine-local variables. However, this isn’t terribly useful, unless you need millions of tiny sleeping goroutines.
Since the main use of having many goroutines is for IO-bound tasks (concretely to process blocking syscalls, notably network or file system IO), you’re much more likely to run into OS limits on other resources, namely network sockets or file handles: golang-nuts › The max number of goroutines and file descriptors?. The usual way to address this is with a pool of the scarce resource, or more simply by just limiting the number via a semaphore; see Conserving File Descriptors in Go and Limiting Concurrency in Go.
答案3
得分: 8
这完全取决于您正在运行的系统。但是 goroutine 非常轻量级。一个平均进程应该没有问题处理 100,000 个并发例程。当然,我们无法在不知道目标平台是什么的情况下回答这个问题。
英文:
That depends entirely on the system you are running on. But goroutines are very lightweight. An average process should have no problems with 100.000 concurrent routines. Whether this goes for your target platform is, of course, something we can't answer without knowing what that platform is.
答案4
得分: 8
要转述一下,有谎言,该死的谎言,还有基准测试。正如Erlang基准测试的作者所承认的那样,
> 不用说,机器上剩下的内存根本不足以执行任何有用的操作。压力测试Erlang
你的硬件是什么,你的操作系统是什么,你的基准测试源代码在哪里?基准测试试图测量和证明/反驳什么?
英文:
To paraphrase, there are lies, damn lies, and benchmarks. As the author of the Erlang benchmark confessed,
> It goes without saying that there wasn't enough memory left in the
> machine to actually do anything useful. stress-testing erlang
What is your hardware, what is your operating system, where is your benchmark source code? What is the benchmark trying to measure and prove/disprove?
答案5
得分: 2
这是一篇关于这个主题的很棒的文章,作者是Dave Cheney,链接在这里:http://dave.cheney.net/2013/06/02/why-is-a-goroutines-stack-infinite
英文:
Here's a great article by Dave Cheney on this topic: http://dave.cheney.net/2013/06/02/why-is-a-goroutines-stack-infinite
答案6
得分: 0
如果goroutine的数量成为一个问题,你可以轻松地为你的程序限制它:
参见mr51m0n/gorc和这个例子。
设置运行goroutine数量的阈值
可以在启动或停止goroutine时增加或减少计数器。
它可以等待最小或最大数量的正在运行的goroutine,从而允许设置同时运行的gorc
受控goroutine的阈值。
英文:
If the number of goroutine ever become an issue, you easily can limit it for your program:
See mr51m0n/gorc and this example.
> Set thresholds on number of running goroutines
>
> Can increase and decrease a counter when starting or stopping a goroutine.
It can wait for a minimum or maximum number of goroutines running, thus allowing to set thresholds for the number of gorc
governed goroutines running at the same time.
答案7
得分: -1
当操作是CPU密集型时,超过核心数量的任何东西都被证明是无用的。
在任何其他情况下,您需要自行测试。
英文:
When the operation was CPU bounded, anything beyond the amount of cores proved to do nothing.
In any other case you will need to test yourself.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论