在amd64上不需要拆分堆栈。

huangapple go评论104阅读模式
英文:

Split stacks unneccesary on amd64

问题

有一种观点认为,在64位架构上使用“分割堆栈”运行时模型是不必要的。我说“似乎是”,因为我没有看到有人真正这样说,只是绕着这个问题说:

“一个典型的多线程程序的内存使用量可以显著减少,因为每个线程不需要最坏情况下的堆栈大小。在32位地址空间中可以运行数百万个线程(无论是完整的NPTL线程还是协程)。”
-- Ian Lance Taylor

这暗示了64位地址空间已经可以处理它。

而且...

“...分割堆栈的恒定开销和狭窄的使用情况(在32位架构上生成大量的I/O绑定任务)是不可接受的...”
-- bstrie

两个问题:他们是在说这个吗?如果是这样,为什么在64位架构上是不必要的?

英文:

There seems to be an opinion out there that using a "split stack" runtime model is unnecessary on 64-bit architectures. I say seems to be, because I haven't seen anyone actually say that, only dance around it:

> The memory usage of a typical multi-threaded program can decrease
> significantly, as each thread does not require a worst-case stack
> size. It becomes possible to run millions of threads (either full NPTL
> threads or co-routines) in a 32-bit address space.
-- Ian Lance Taylor

...implying that a 64-bit address space can already handle it.

And...

> ... the constant overhead of split stacks and the narrow use case
> (spawning enormous numbers of I/O-bound tasks on 32-bit architectures)
> isn't acceptable...
-- bstrie

Two questions: Is this what they are saying? Second, if so, why are they unneccesary on 64-bit architectures?

答案1

得分: 19

是的,这就是他们所说的。

在64位架构上,分割堆栈(目前)是不必要的,因为64位虚拟地址空间非常大,可以包含数百万个堆栈地址范围,每个范围都可以达到整个32位地址空间的大小,如果需要的话。

在现今使用的平坦内存模型中,从虚拟地址到物理内存位置的转换是通过硬件内存管理单元(MMU)的支持来完成的。在amd64上,事实证明,为每个新创建的堆栈保留64位虚拟地址空间的大块是更好的选择(意味着整体上更快),同时只将第一页(4kB)映射到实际的RAM。这样,堆栈就能够根据需要在连续的虚拟地址上增长和收缩(意味着每个函数序言中的代码更少,这是一个很大的优化),而操作系统重新配置MMU,将每个虚拟地址页面映射到一个实际的空闲RAM页面,每当堆栈增长或收缩超过/低于一些可配置的阈值时。

通过聪明地选择阈值(参见例如动态数组的理论),您可以在平均堆栈操作上实现O(1)的复杂度,同时保留可以根据需要增长并且只消耗所使用内存的数百万个堆栈的好处。

PS:当前的Go实现远远落后于这些内容 在amd64上不需要拆分堆栈。

英文:

Yes, that's what they are saying.

Split stacks are (currently) unnecessary on 64bit architectures because the 64bit virtual address space is so large it can contain millions of stack address ranges, each as large as an entire 32bit address space, if needed.

In the Flat memory model in use nowadays, the translation from virtual addresses to phisical memory locations is done with the support of the hardware MMU. On amd64 it turns out it's better (meaning, overall faster) to reserve big chunks of the 64bit virtual address space to each new stack you are creating, while only mapping the first page (4kB) to actual RAM. This way, the stack will be able to grow and shrink as needed, over contiguous virtual addresses (meaning less code in each function prologue, a big optimization) while the OS re-configures the MMU to map each page of virtual addresses to an actual free page of RAM, whenever the stack grows or shrinks above/below some configurable thresholds.

By choosing the thresholds smartly (see for example the theory of dynamic arrays) you can achieve O(1) complexity on the average stack operation, while retaining the benefits of millions of stacks that can grow as much as you need and only consume the memory they use.

PS: the current Go implementation is far behind any of this 在amd64上不需要拆分堆栈。

答案2

得分: 9

Go核心团队目前正在讨论在未来的Go版本中使用连续栈的可能性。

分割栈的方法很有用,因为栈可以更灵活地增长,但这也要求运行时分配相对较大的内存块来分配这些栈。由于这个原因,关于Go的内存使用情况一直存在很多混淆。

使用连续但可增长(可重定位)栈是一个选项,它将提供相同的灵活性,可能会减少对Go内存使用情况的混淆。同时,也可以解决一些低内存机器上的一些问题(请参见链接的讨论)。

至于32位和64位架构之间的优势/劣势,我认为没有直接与分段栈使用相关的优势/劣势。

英文:

The Go core team is currently discussing the possibility of using contiguous stacks in a future Go version.

The split stack approach is useful because stacks can grow more flexibly but it also requires that the runtime allocates a relatively big chunk of memory to distribute these stacks across. There has been a lot of confusion about Go's memory usage, in part because of this.

Making contiguous but growable (relocatable) stacks is an option that would provide the same flexibility and maybe reduce the confusion about Go's memory usage. As well as remedying some ill corner-cases on low-memory machines (see linked thread).

As to advantages/disadvantages on 32-bit vs. 64-bit architectures, I don't think there are any directly associated solely with the use of segmented stacks.

答案3

得分: 2

更新Go 1.4(2014年第四季度)

对运行时的更改

> 在Go 1.4之前,运行时(垃圾收集器、并发支持、接口管理、映射、切片、字符串等)大部分是用C编写的,部分使用汇编支持。
在1.4中,许多代码已经被转换为Go,以便垃圾收集器可以扫描运行时程序的堆栈并获取关于活动变量的准确信息

> 这个重写使得1.4中的垃圾收集器完全精确,意味着它知道程序中所有活动指针的位置。这意味着堆将更小,因为不会有误报导致非指针保持活动。其他相关的更改也减小了堆的大小,相对于之前的版本整体上减小了10%-30%。

> 其结果是堆栈不再分段,消除了“热分裂”问题。当达到堆栈限制时,会分配一个新的更大的堆栈,将goroutine的所有活动帧都复制到其中,并更新堆栈中的任何指针。


最初的回答(2014年3月)

文章“Go中的连续堆栈”也解决了这个问题,作者是Agis Anastasopoulo

> 在堆栈边界恰好位于一个紧密循环中的情况下,重复创建和销毁段的开销变得很大。
这在Go社区内被称为“热分裂”问题。

> “热分裂”将在Go 1.3中通过实现连续堆栈来解决。

> 现在,当堆栈需要增长时,不再分配新的段,而是执行以下操作:

> 1. 创建一个新的稍大堆栈
2. 将旧堆栈的内容复制到新堆栈
3. 调整每个复制的指针以指向新地址
4. 销毁旧堆栈

以下提到了主要出现在32位架构中的一个问题:

> 然而,有一个挑战。
1.2版本的运行时不知道堆栈中指针大小的字是否是实际指针。可能有浮点数,很少有整数,如果将其解释为指针,实际上会指向数据。

> 由于缺乏这种知识,垃圾收集器必须保守地将堆栈帧中的所有位置视为根。这在32位架构上尤其容易导致内存泄漏,因为它们的地址池要小得多。

> 然而,在复制堆栈时,必须避免这种情况,只有真正的指针在重新调整时才应该被考虑。

> 虽然已经进行了相关工作,并且关于活动堆栈指针的信息现在已嵌入到二进制文件中并可供运行时使用。
这意味着1.3版本中的收集器可以*精确地*处理堆栈数据,并且现在可以重新调整堆栈指针。

英文:

Update Go 1.4 (Q4 2014)

Change to the runtime:

> Up to Go 1.4, the runtime (garbage collector, concurrency support, interface management, maps, slices, strings, ...) was mostly written in C, with some assembler support.
In 1.4, much of the code has been translated to Go so that the garbage collector can scan the stacks of programs in the runtime and get accurate information about what variables are active.

> This rewrite allows the garbage collector in 1.4 to be fully precise, meaning that it is aware of the location of all active pointers in the program. This means the heap will be smaller as there will be no false positives keeping non-pointers alive. Other related changes also reduce the heap size, which is smaller by 10%-30% overall relative to the previous release.

> A consequence is that stacks are no longer segmented, eliminating the "hot split" problem. When a stack limit is reached, a new, larger stack is allocated, all active frames for the goroutine are copied there, and any pointers into the stack are updated.


Initial answer (March 2014)

The article "Contiguous stacks in Go" by Agis Anastasopoulo also addresses this issue

> In such cases where the stack boundary happens to fall in a tight loop, the overhead of creating and destroying segments repeatedly becomes significant.
This is called the “hot split” problem inside the Go community.

> The “hot split” will be addressed in Go 1.3 by implementing contiguous stacks.

> Now when a stack needs to grow, instead of allocating a new segment the following happens:

> 1. Create a new, somewhat larger stack
2. Copy the contents of the old stack to the new stack
3. Re-adjust every copied pointer to point to the new addresses
4. Destroy the old stack

The following mention one problem seen mainly in 32-bit arhcitectures:

> There is a certain challenge though.
The 1.2 runtime doesn’t know if a pointer-sized word in the stack is an actual pointer or not. There may be floats and most rarely integers that if interpreted as pointers, would actually point to data.

> Due to the lack of such knowledge the garbage collector has to conservatively consider all the locations in the stack frames to be roots. This leaves the possibility for memory leaks especially on 32-bit architectures since their address pool is much smaller.

> When copying stacks however, such cases have to be avoided and only real pointers should be taken into account when re-adjusting.

> Work was done though and information about live stack pointers is now embedded in the binaries and is available to the runtime.
This means not only that the collector in 1.3 can precisely stack data but re-adjusting stack pointers is now possible.

huangapple
  • 本文由 发表于 2013年10月18日 20:50:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/19450145.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定