Golang: 为什么 runtime.GOMAXPROCS 的限制是 256?

huangapple go评论87阅读模式
英文:

Golang: why runtime.GOMAXPROCS is limited to 256?

问题

我在 MacBook 和 Ubuntu 上使用 golang 1.7.3 进行开发时发现 runtime.GOMAXPROCS 的限制是 256。有人知道这个限制是从哪里来的吗?有没有相关的文档说明,为什么会有这个限制?这是一种实现优化吗?

我在这个页面上找到了关于 256 的唯一参考,该页面描述了 golang 的 runtime 包:https://golang.org/pkg/runtime/。runtime.MemStats 结构体有两个大小为 256 的统计数组:

type MemStats struct {
    ...
    PauseNs       [256]uint64 // 最近 GC 暂停持续时间的循环缓冲区,最近的在 [(NumGC+255)%256]
    PauseEnd      [256]uint64 // 最近 GC 暂停结束时间的循环缓冲区
}

这是我使用的示例 golang 代码:

func main() {
    runtime.GOMAXPROCS(1000)
    log.Printf("GOMAXPROCS %d\n", runtime.GOMAXPROCS(-1))
}

输出结果为:

GOMAXPROCS 256

另外,有人能指导我如何查找关于 GOMAXPROCS 与 golang 调度器使用的操作系统线程数之间关系的文档吗(如果有的话)?我们应该观察使用 GOMAXPROCS 个操作系统线程运行的 go 编译代码吗?

编辑: 感谢 @twotwotwo 指出 GOMAXPROCS 与操作系统线程的关系。不过有趣的是,文档中没有提到这个 256 的限制(除了在可能与此相关的 MemStats 结构体中)。

我想知道是否有人知道这个 256 的真正原因。

英文:

I was playing with golang 1.7.3 on MacBook and Ubuntu and found that runtime.GOMAXPROCS is limited to 256. Does anyone know where this limit comes from? Is this documented anywhere and why would there be a limit? Is this an implementation optimization?

Only reference to 256 I could find is on this page that describes golang's runtime package: https://golang.org/pkg/runtime/. The runtime.MemStats struct has a couple of stat arrays of size 256:

type MemStats struct {
    ...
    PauseNs       [256]uint64 // circular buffer of recent GC pause durations, most recent at [(NumGC+255)%256]
    PauseEnd      [256]uint64 // circular buffer of recent GC pause end times

Here's example golang code I used:

func main() {
    runtime.GOMAXPROCS(1000)
log.Printf("GOMAXPROCS %d\n", runtime.GOMAXPROCS(-1))

}

Prints
GOMAXPROCS 256

P.S.
Also, can someone point me to documentation on how this GOMAXPROCS relate to OS thread count used by golang scheduler (if at all). Shall we observe go-compiled code running GOMAXPROCS OS threads?

EDIT: Thanks @twotwotwo for pointing out how GOMAXPROCS relate to OS threads. Still it's interesting that documentation does not mention this 256 limit (other that in the MemStats struct which may or may not be related).

I wonder if anyone is aware of the true reason for this 256 number.

答案1

得分: 4

package runtime文档解释了GOMAXPROCS与操作系统线程的关系:

GOMAXPROCS变量限制了可以同时执行用户级Go代码的操作系统线程数量。在代表Go代码进行系统调用时,可以阻塞的线程数量没有限制;这些线程不计入GOMAXPROCS的限制。该包的GOMAXPROCS函数用于查询和更改限制。

因此,你可能会看到超过GOMAXPROCS个操作系统线程(因为一些线程在系统调用中被阻塞,并且没有限制可以有多少个线程),或者更少(因为GOMAXPROCS只被记录为限制线程数量,而不是精确指定线程数量)。

我认为限制GOMAXPROCS与文档的精神是一致的——你指定了你可以接受同时运行1000个运行Go代码的操作系统线程,但运行时决定“仅仅”运行256个。这并不限制活动的goroutine数量,因为它们被复用到操作系统线程上——当一个goroutine阻塞(例如等待网络读取完成),Go的内部调度器会在同一个操作系统线程上开始其他工作。

Go团队可能做出这个选择是为了最小化Go程序运行的操作系统线程数量远远超过大多数现代计算机核心数的机会;这会导致更多的操作系统上下文切换,而这种切换可能比如果GOMAXPROCS保持在CPU核心数的数量时发生的用户模式goroutine切换要慢。或者这可能只是为了Go的内部调度器的设计方便,对GOMAXPROCS设置了一个上限。

Goroutines vs Threads并不完美,例如goroutines现在没有分段堆栈,但它可能会帮助你理解底层发生了什么。

英文:

The package runtime docs clarify how GOMAXPROCS relates to OS threads:

> The GOMAXPROCS variable limits the number of operating system threads that can execute user-level Go code simultaneously. There is no limit to the number of threads that can be blocked in system calls on behalf of Go code; those do not count against the GOMAXPROCS limit. This package's GOMAXPROCS function queries and changes the limit.

So you could see more than GOMAXPROCS OS threads (because some are blocked in system calls, and there's no limit to how many), or fewer (because GOMAXPROCS is only documented to limit the number of threads, not prescribe it exactly).

I think capping GOMAXPROCS is consistent with the spirit of that documentation--you specified you were OK with 1000 OS threads running Go code, but the runtime decided to 'only' run 256. That doesn't limit the number of goroutines active because they're multiplexed onto OS threads--when one goroutine blocks (waiting for a network read to complete, say) Go's internal scheduler starts other work on the same OS thread.

The Go team might have made this choice to minimize the chance that Go programs end up running many times more OS threads than most machines today have cores; that would cause more OS context switches, which can be slower than user-mode goroutine switches that would occur if GOMAXPROCS were kept down to the number of CPU cores present. Or it might just have been convenient for the design Go's internal scheduler to have an upper bound on GOMAXPROCS.

Goroutines vs Threads is not perfect, e.g. goroutines don't have segmented stacks now, but it may help you understand what's going on here under the hood.

答案2

得分: 4

请注意,从下一个Go 1.10版本(2018年第一季度)开始,GOMAXPROCS将不再受任何限制。

运行时不再人为限制GOMAXPROCS(之前限制为1024)。

请参见Austin Clements(aclements提交的ee55000提交,修复了问题15131

allp在这里定义

还请参见e900e27提交

  • runtime:清理对allp的循环
  • allp现在的长度为gomaxprocs,这意味着allp[i]都不是nil或处于_Pdead状态。这样可以用普通的范围循环替换多种不同风格的对allp的循环。
  • for i := 0; i < gomaxprocs; i++ { ... }循环可以直接使用allp进行范围循环。
  • 同样,对allp[:gomaxprocs]的范围循环可以直接对allp进行范围循环。
  • 不再需要检查p == nil || p.state == _Pdead的循环。
  • 如果已死亡的Ps不会影响循环,那么不再需要检查p == nil的循环。我已经检查了所有这样的循环,事实上都不受已死亡的Ps的影响。其中一个循环可能受到影响,这个修复在procresize中将p.gcAssistTime清零。
英文:

Note that, starting the next Go 1.10 (Q1 2018), GOMAXPROCS will be limited by ... nothing.

> The runtime no longer artificially limits GOMAXPROCS (previously it was limited to 1024).

See commit ee55000 by Austin Clements (aclements), which fixes issue 15131.

> Now that allp is dynamically allocated, there's no need for a hard cap
on GOMAXPROCS.


allp is defined here.

See also commit e900e27:

> ## runtime: clean up loops over allp

> allp now has length gomaxprocs, which means none of allp[i] are nil or in state _Pdead.
This lets replace several different styles of loops over allp with normal range loops.

> for i := 0; i &lt; gomaxprocs; i++ { ... } loops can simply range over
allp.
Likewise, range loops over allp[:gomaxprocs] can just range over
allp.

> Loops that check for p == nil || p.state == _Pdead don't need to check
this any more.

> Loops that check for p == nil don't have to check this if dead Ps
don't affect them. I checked that all such loops are, in fact,
unaffected by dead Ps. One loop was potentially affected, which this
fixes by zeroing p.gcAssistTime in procresize.

huangapple
  • 本文由 发表于 2016年12月3日 08:46:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/40943065.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定