英文:
Why passing pointers to channel is slower
问题
我是你的中文翻译助手,以下是翻译好的内容:
我是一个对golang新手,正在尝试用golang重写我的Java服务器项目。
我发现,将指针传递到通道中会导致性能下降近30%,与传递值相比。
这是一个示例代码片段:
package main
import (
"time"
"fmt"
)
var c = make(chan t, 1024)
// var c = make(chan *t, 1024)
type t struct {
a uint
b uint
}
func main() {
start := time.Now()
for i := 0; i < 1000; i++ {
b := t{a:3, b:5}
// c <- &b
c <- b
}
elapsed := time.Since(start)
fmt.Println(t2)
}
#更新。修复了包缺失的问题
英文:
I'm a newbie to golang, trying to rewrite my java server project in golang.
I found, passing pointers into channel cause almost 30% performance drop compared to passing values.
Here is a sample snippet:
package main
import (
"time"
"fmt"
)
var c = make(chan t, 1024)
// var c = make(chan *t, 1024)
type t struct {
a uint
b uint
}
func main() {
start := time.Now()
for i := 0; i < 1000; i++ {
b := t{a:3, b:5}
// c <- &b
c <- b
}
elapsed := time.Since(start)
fmt.Println(t2)
}
#update. fix the package missing
答案1
得分: 15
作为一个值,它可以在栈上分配:
go run -gcflags '-m' tmp.go
# command-line-arguments
./tmp.go:18: 内联调用 time.Time.Nanosecond
./tmp.go:24: 内联调用 time.Time.Nanosecond
./tmp.go:25: t2 逃逸到堆上
./tmp.go:25: main ... 参数不逃逸
63613
作为一个指针,它逃逸到堆上:
go run -gcflags '-m' tmp.go
# command-line-arguments
./tmp.go:18: 内联调用 time.Time.Nanosecond
./tmp.go:24: 内联调用 time.Time.Nanosecond
./tmp.go:21: &b 逃逸到堆上 <-- 额外的 GC 压力
./tmp.go:20: 移动到堆上: b <--
./tmp.go:25: t2 逃逸到堆上
./tmp.go:25: main ... 参数不逃逸
122513
逃逸到堆上会引入一些开销/ GC 压力。
查看汇编代码,指针版本还引入了额外的指令,包括:
go run -gcflags '-S' tmp.go
0x0055 00085 (...tmp.go:18) CALL runtime.newobject(SB)
非指针变体在调用 runtime.chansend1
之前不会产生这种开销。
英文:
As a value it can be stack allocated:
go run -gcflags '-m' tmp.go
# command-line-arguments
./tmp.go:18: inlining call to time.Time.Nanosecond
./tmp.go:24: inlining call to time.Time.Nanosecond
./tmp.go:25: t2 escapes to heap
./tmp.go:25: main ... argument does not escape
63613
As a pointer, it escapes to the heap:
go run -gcflags '-m' tmp.go
# command-line-arguments
./tmp.go:18: inlining call to time.Time.Nanosecond
./tmp.go:24: inlining call to time.Time.Nanosecond
./tmp.go:21: &b escapes to heap <-- Additional GC pressure
./tmp.go:20: moved to heap: b <--
./tmp.go:25: t2 escapes to heap
./tmp.go:25: main ... argument does not escape
122513
Escaping to the heap introduces some overhead / GC pressure.
Looking at the assembly, the pointer version also introduces additional instructions, including:
go run -gcflags '-S' tmp.go
0x0055 00085 (...tmp.go:18) CALL runtime.newobject(SB)
The non-pointer variant doesn't incur this overhead before calling runtime.chansend1
.
答案2
得分: 1
作为对Martin Gallagher的良好分析的补充,必须补充说明你测量的方式是可疑的。这种微小程序的性能变化很大,因此应该进行重复测量。你的示例中也有一些错误。
首先:它无法编译,因为缺少包语句。
其次:Nanoseconds
和Nanosecond
之间有一个重要的区别。
我尝试以以下方式评估你的观察结果<sup>*</sup>:
package main
import (
"time"
"fmt"
)
const (
chan_size = 1000
cycle_count = 1000
)
var (
v_ch = make(chan t, chan_size)
p_ch = make(chan *t, chan_size)
)
type t struct {
a uint
b uint
}
func fill_v() {
for i := 0; i < chan_size; i++ {
b := t{a:3, b:5}
v_ch <- b
}
}
func fill_p() {
for i := 0; i < chan_size; i++ {
b := t{a:3, b:5}
p_ch <- &b
}
}
func measure_f(f func()) int64 {
start := time.Now()
f();
elapsed := time.Since(start)
return elapsed.Nanoseconds()
}
func main() {
var v_nanos int64 = 0
var p_nanos int64 = 0
for i := 0; i<cycle_count; i++ {
v_nanos += measure_f(fill_v);
for i := 0; i < chan_size; i++ {
_ = <- v_ch
}
}
for i := 0; i<cycle_count; i++ {
p_nanos += measure_f(fill_p);
for i := 0; i < chan_size; i++ {
_ = <- p_ch
}
}
fmt.Println(
"v:",v_nanos/cycle_count,
" p:", p_nanos/cycle_count,
"ratio (v/p):", float64(v_nanos)/float64(p_nanos))
}
确实存在可测量的性能下降(我将下降定义为drop=1-(candidate/optimum)
),但尽管我重复运行代码1000次,下降率在25%到50%之间变化,我甚至不确定堆是如何回收的以及何时回收,因此可能很难量化。
<sup>*</sup>在ideone上查看“运行”演示
...请注意,stdout被冻结:v: 34875 p: 59420 ratio (v/p)0.586923845267128
由于某种原因,无法在Go Playground上运行此代码。
英文:
As a supplement to the good analysis of Martin Gallagher, it must be added that the way you are measuring is suspicious. The performance of such tiny programs varies a lot, so measuring should be done repeatedly. There are also some mistakes in your example.
First: it doesn't compile because the package statement is missing.
Second: there is an important difference between Nanoseconds
and Nanosecond
I tried to evaluate your observation this way<sup>*</sup>:
package main
import (
"time"
"fmt"
)
const (
chan_size = 1000
cycle_count = 1000
)
var (
v_ch = make(chan t, chan_size)
p_ch = make(chan *t, chan_size)
)
type t struct {
a uint
b uint
}
func fill_v() {
for i := 0; i < chan_size; i++ {
b := t{a:3, b:5}
v_ch <- b
}
}
func fill_p() {
for i := 0; i < chan_size; i++ {
b := t{a:3, b:5}
p_ch <- &b
}
}
func measure_f(f func()) int64 {
start := time.Now()
f();
elapsed := time.Since(start)
return elapsed.Nanoseconds()
}
func main() {
var v_nanos int64 = 0
var p_nanos int64 = 0
for i := 0; i<cycle_count; i++ {
v_nanos += measure_f(fill_v);
for i := 0; i < chan_size; i++ {
_ = <- v_ch
}
}
for i := 0; i<cycle_count; i++ {
p_nanos += measure_f(fill_p);
for i := 0; i < chan_size; i++ {
_ = <- p_ch
}
}
fmt.Println(
"v:",v_nanos/cycle_count,
" p:", p_nanos/cycle_count,
"ratio (v/p):", float64(v_nanos)/float64(p_nanos))
}
There is indeed a measurable performance drop (I define drop like this drop=1-(candidate/optimum)
), but although I repeat the code 1000 times, it varies between 25% and 50%, I'm not even sure how the heap is recycled and when, so it maybe hard to quantify at all.
<sup>*</sup>see a "running" demo on ideone
...note that stdout is frozen: v: 34875 p: 59420 ratio (v/p)0.586923845267128
For some reason, it was not possible to run this code in the Go Playground
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论