Golang切片分配性能

huangapple go评论88阅读模式
英文:

golang slice allocation performance

问题

我在检查GO语言内存分配性能时,偶然发现了一件有趣的事情。

package main

import (
    "fmt"
    "time"
)

func main() {
    const alloc int = 65536
    now := time.Now()
    loop := 50000
    for i := 0; i < loop; i++ {
        sl := make([]byte, alloc)
        i += len(sl) * 0
    }
    elapsed := time.Since(now)
    fmt.Printf("花费 %s 分配 %d 字节 %d 次", elapsed, alloc, loop)
}

我在一台配备Core-i7 2600处理器、go版本为1.6 64位(32位也有相同结果)和16GB内存的Windows 10系统上运行这段代码。当alloc为65536(正好64K)时,运行时间为30秒。当alloc为65535时,运行时间约为200毫秒。有人能解释一下这是为什么吗?我在家里用我的Core i7-920 @ 3.8GHZ尝试了相同的代码,但结果不同(两者都在200毫秒左右)。有人知道发生了什么吗?

英文:

I stumbled upon an interesting thing while checking performance of memory allocation in GO.

package main

import (
      &quot;fmt&quot;
      &quot;time&quot;
    )

func main(){
   const alloc int = 65536
   now := time.Now()
   loop := 50000
   for i := 0; i&lt;loop;i++{
      sl := make([]byte, alloc)
      i += len(sl) * 0
   }
   elpased := time.Since(now)
   fmt.Printf(&quot;took %s to allocate %d bytes %d times&quot;, elpased, alloc, loop) 
}

I am running this on a Core-i7 2600 with go version 1.6 64bit (also same results on 32bit) and 16GB of RAM (on WINDOWS 10)
so when alloc is 65536 (exactly 64K) it runs for 30 seconds (!!!!).
When alloc is 65535 it takes ~200ms.
Can someone explain this to me please?
I tried the same code at home with my core i7-920 @ 3.8GHZ but it didn't show same results (both took around 200ms). Anyone has an idea what's going on?

答案1

得分: 8

设置GOGC=off可以提高性能(降低到不到100毫秒)。为什么呢?
因为逃逸分析。当你使用go build -gcflags -m构建时,编译器会打印出所有逃逸到堆上的分配。这真的取决于你的机器和GO编译器版本,但当编译器决定将分配移动到堆上时,意味着两件事:

  1. 分配将花费更长时间(因为在堆栈上“分配”只是一个CPU指令)
  2. 垃圾回收器稍后必须清理该内存-消耗更多的CPU时间
    对于我的机器,分配65536字节逃逸到堆上,而65535字节则不逃逸。
    这就是为什么1个字节改变了整个过程从200毫秒变成30秒。太神奇了。。
英文:

Setting GOGC=off improved performance (down to less than 100ms). Why?
becaue of escape analysis. When you build with go build -gcflags -m the compiler prints whatever allocations escapes to heap. It really depends on your machine and GO compiler version but when the compiler decides that the allocation should move to heap it means 2 things:

  1. the allocation will take longer (since "allocating" on the stack is just 1 cpu instruction)
  2. the GC will have to clean up that memory later - costing more CPU time
    for my machine, the allocation of 65536 bytes escapes to heap and 65535 doesn't.
    that's why 1 bytes changed the whole proccess from 200ms to 30s. Amazing..

答案2

得分: 4

注意/更新2021年:正如Tapir LiuiGo101中通过这条推文所指出的:

> 从Go 1.17开始,如果编译器证明切片 x 的元素仅在当前goroutine中使用且N &lt;= 64KB,Go运行时将在堆栈上分配这些元素:
>
> var x = make([]byte, N)
>
> 如果编译器证明数组 y 仅在当前goroutine中使用且N &lt;= 10MB,Go运行时将在堆栈上分配数组 y
>
> var y [N]byte
>
> 那么如何在堆栈上分配(切片的元素),即大小大于64KB但不大于10MB的切片(且切片仅在一个goroutine中使用)?
>
> 只需使用以下方式:
>
> var y [N]byte
> var x = y[:]

考虑到堆栈分配比堆分配更快,这将直接影响到您的测试,对于alloc等于65536及更大的值。

Tapir 补充道

> 实际上,我们可以在堆栈上分配任意总元素大小的切片。
>
> go &gt; const N = 500 * 1024 * 1024 // 500M &gt; var v byte = 123 &gt; &gt; func createSlice() byte { &gt; var s = []byte{N: 0} &gt; for i := range s { s[i] = v } &gt; return s[v] &gt; } &gt;
> 将500更改为512会导致程序崩溃。

英文:

Note/Update 2021: as Tapir Liui notes in Go101 with this tweet:

> As of Go 1.17, Go runtime will allocate the elements of slice x on stack if the compiler proves they are only used in the current goroutine and N &lt;= 64KB:
>
> var x = make([]byte, N)
>
> And Go runtime will allocate the array y on stack if the compiler proves it is only used in the current goroutine and N &lt;= 10MB:
>
> var y [N]byte
>
> Then how to allocated (the elements of) a slice which size is larger than 64KB but not larger than 10MB on stack (and the slice is only used in one goroutine)?
>
> Just use the following way:
>
> var y [N]byte
> var x = y[:]

Considering stack allocation is faster than heap allocation, that would have a direct effect on your test, for alloc equals to 65536 and more.

Tapir adds:

> In fact, we could allocate slices with arbitrary sum element sizes on stack.
>
>go
&gt;const N = 500 * 1024 * 1024 // 500M
&gt;var v byte = 123
&gt;
&gt;func createSlice() byte {
&gt; var s = []byte{N: 0}
&gt; for i := range s { s[i] = v }
&gt; return s[v]
&gt;}
&gt;

> Changing 500 to 512 make program crash.

答案3

得分: 0

原因很简单。

常量alloc的值为65535。

0x0000 00000 (example.go:8) TEXT &quot;&quot;.main(SB), ABIInternal, $65784-0

常量alloc的值为65536。

0x0000 00000 (example.go:8) TEXT &quot;&quot;.main(SB), ABIInternal, $248-0

区别在于切片的创建位置。

英文:

the reason is very simple.

const alloc int = 65535

0x0000 00000 (example.go:8) TEXT &quot;&quot;.main(SB), ABIInternal, $65784-0

const alloc int = 65536

0x0000 00000 (example.go:8) TEXT &quot;&quot;.main(SB), ABIInternal, $248-0

the difference is where the slice are created.

huangapple
  • 本文由 发表于 2016年3月21日 16:16:21
  • 转载请务必保留本文链接:https://go.coder-hub.com/36125927.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定