
huangapple go评论86阅读模式

Split stacks unneccesary on amd64



-- Ian Lance Taylor



-- bstrie



There seems to be an opinion out there that using a "split stack" runtime model is unnecessary on 64-bit architectures. I say seems to be, because I haven't seen anyone actually say that, only dance around it:

> The memory usage of a typical multi-threaded program can decrease
> significantly, as each thread does not require a worst-case stack
> size. It becomes possible to run millions of threads (either full NPTL
> threads or co-routines) in a 32-bit address space.
-- Ian Lance Taylor

...implying that a 64-bit address space can already handle it.


> ... the constant overhead of split stacks and the narrow use case
> (spawning enormous numbers of I/O-bound tasks on 32-bit architectures)
> isn't acceptable...
-- bstrie

Two questions: Is this what they are saying? Second, if so, why are they unneccesary on 64-bit architectures?


得分: 19





PS:当前的Go实现远远落后于这些内容 在amd64上不需要拆分堆栈。


Yes, that's what they are saying.

Split stacks are (currently) unnecessary on 64bit architectures because the 64bit virtual address space is so large it can contain millions of stack address ranges, each as large as an entire 32bit address space, if needed.

In the Flat memory model in use nowadays, the translation from virtual addresses to phisical memory locations is done with the support of the hardware MMU. On amd64 it turns out it's better (meaning, overall faster) to reserve big chunks of the 64bit virtual address space to each new stack you are creating, while only mapping the first page (4kB) to actual RAM. This way, the stack will be able to grow and shrink as needed, over contiguous virtual addresses (meaning less code in each function prologue, a big optimization) while the OS re-configures the MMU to map each page of virtual addresses to an actual free page of RAM, whenever the stack grows or shrinks above/below some configurable thresholds.

By choosing the thresholds smartly (see for example the theory of dynamic arrays) you can achieve O(1) complexity on the average stack operation, while retaining the benefits of millions of stacks that can grow as much as you need and only consume the memory they use.

PS: the current Go implementation is far behind any of this 在amd64上不需要拆分堆栈。


得分: 9






The Go core team is currently discussing the possibility of using contiguous stacks in a future Go version.

The split stack approach is useful because stacks can grow more flexibly but it also requires that the runtime allocates a relatively big chunk of memory to distribute these stacks across. There has been a lot of confusion about Go's memory usage, in part because of this.

Making contiguous but growable (relocatable) stacks is an option that would provide the same flexibility and maybe reduce the confusion about Go's memory usage. As well as remedying some ill corner-cases on low-memory machines (see linked thread).

As to advantages/disadvantages on 32-bit vs. 64-bit architectures, I don't think there are any directly associated solely with the use of segmented stacks.


得分: 2

更新Go 1.4(2014年第四季度)


> 在Go 1.4之前,运行时(垃圾收集器、并发支持、接口管理、映射、切片、字符串等)大部分是用C编写的,部分使用汇编支持。

> 这个重写使得1.4中的垃圾收集器完全精确,意味着它知道程序中所有活动指针的位置。这意味着堆将更小,因为不会有误报导致非指针保持活动。其他相关的更改也减小了堆的大小,相对于之前的版本整体上减小了10%-30%。

> 其结果是堆栈不再分段,消除了“热分裂”问题。当达到堆栈限制时,会分配一个新的更大的堆栈,将goroutine的所有活动帧都复制到其中,并更新堆栈中的任何指针。


文章“Go中的连续堆栈”也解决了这个问题,作者是Agis Anastasopoulo

> 在堆栈边界恰好位于一个紧密循环中的情况下,重复创建和销毁段的开销变得很大。

> “热分裂”将在Go 1.3中通过实现连续堆栈来解决。

> 现在,当堆栈需要增长时,不再分配新的段,而是执行以下操作:

> 1. 创建一个新的稍大堆栈
2. 将旧堆栈的内容复制到新堆栈
3. 调整每个复制的指针以指向新地址
4. 销毁旧堆栈


> 然而,有一个挑战。

> 由于缺乏这种知识,垃圾收集器必须保守地将堆栈帧中的所有位置视为根。这在32位架构上尤其容易导致内存泄漏,因为它们的地址池要小得多。

> 然而,在复制堆栈时,必须避免这种情况,只有真正的指针在重新调整时才应该被考虑。

> 虽然已经进行了相关工作,并且关于活动堆栈指针的信息现在已嵌入到二进制文件中并可供运行时使用。


Update Go 1.4 (Q4 2014)

Change to the runtime:

> Up to Go 1.4, the runtime (garbage collector, concurrency support, interface management, maps, slices, strings, ...) was mostly written in C, with some assembler support.
In 1.4, much of the code has been translated to Go so that the garbage collector can scan the stacks of programs in the runtime and get accurate information about what variables are active.

> This rewrite allows the garbage collector in 1.4 to be fully precise, meaning that it is aware of the location of all active pointers in the program. This means the heap will be smaller as there will be no false positives keeping non-pointers alive. Other related changes also reduce the heap size, which is smaller by 10%-30% overall relative to the previous release.

> A consequence is that stacks are no longer segmented, eliminating the "hot split" problem. When a stack limit is reached, a new, larger stack is allocated, all active frames for the goroutine are copied there, and any pointers into the stack are updated.

Initial answer (March 2014)

The article "Contiguous stacks in Go" by Agis Anastasopoulo also addresses this issue

> In such cases where the stack boundary happens to fall in a tight loop, the overhead of creating and destroying segments repeatedly becomes significant.
This is called the “hot split” problem inside the Go community.

> The “hot split” will be addressed in Go 1.3 by implementing contiguous stacks.

> Now when a stack needs to grow, instead of allocating a new segment the following happens:

> 1. Create a new, somewhat larger stack
2. Copy the contents of the old stack to the new stack
3. Re-adjust every copied pointer to point to the new addresses
4. Destroy the old stack

The following mention one problem seen mainly in 32-bit arhcitectures:

> There is a certain challenge though.
The 1.2 runtime doesn’t know if a pointer-sized word in the stack is an actual pointer or not. There may be floats and most rarely integers that if interpreted as pointers, would actually point to data.

> Due to the lack of such knowledge the garbage collector has to conservatively consider all the locations in the stack frames to be roots. This leaves the possibility for memory leaks especially on 32-bit architectures since their address pool is much smaller.

> When copying stacks however, such cases have to be avoided and only real pointers should be taken into account when re-adjusting.

> Work was done though and information about live stack pointers is now embedded in the binaries and is available to the runtime.
This means not only that the collector in 1.3 can precisely stack data but re-adjusting stack pointers is now possible.

  • 本文由 发表于 2013年10月18日 20:50:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/19450145.html



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
