英文:
performance of golang select statement in a for loop
问题
我进行了一个测试来评估select语句的性能,并发现结果不太好。Go版本是1.7.3。
当运行上述代码时,你会发现每次添加一个serverDone case时,CPU使用率会上升(大约5%)。当移除所有的serverDone case时,CPU使用率约为5%,这并不好。
如果我将全局锁定对象(如serverDone)改为局部对象,性能会有所提升,但仍然不够好。
谁知道我的case中是否有任何问题,或者select语句的正确用法是什么?
英文:
I make a test to see the performance of select, and found the result is not
good. The go version is 1.7.3
package main
import (
"fmt"
"log"
"os"
"runtime/pprof"
"time"
)
var serverDone = make(chan struct{})
var serverDone1 = make(chan struct{})
var serverDone2 = make(chan struct{})
var serverDone3 = make(chan struct{})
var serverDone4 = make(chan struct{})
var serverDone5 = make(chan struct{})
func main() {
f, err := os.Create("cpu.pprof")
if err != nil {
log.Fatal(err)
}
pprof.StartCPUProfile(f)
defer pprof.StopCPUProfile()
for i := 0; i < 1000; i++ {
go messageLoop()
}
<-time.After(10 * time.Second)
close(serverDone)
fmt.Println("finished")
}
func messageLoop() {
var ticker = time.NewTicker(100 * time.Millisecond)
defer ticker.Stop()
var counter = 0
for {
select {
case <-serverDone:
return
case <-serverDone1:
return
// case <-serverDone2:
// return
// case <-serverDone3:
// return
// case <-serverDone4:
// return
// case <-serverDone5:
// return
case <-ticker.C:
counter += 1
}
}
}
When run the above code, you will find the CPU up(in my book, about 5%) each time when a serverDone case is added.
When all of the serverDone case are removed, the CPU is about 5%, It's not good.
If I turn globally locked object(like serverDone) to locally, the performance is better, but still not good enough.
Who knows is there anything wrong in my case, or what is the correct usage of select statement?
答案1
得分: 8
简短回答:Channels 使用互斥锁(mutex)。更多的通道意味着更多的 futex
系统调用。
这是程序的 strace 输出。
有 7 个 select 语句等待 7 个 channels
的代码:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
98.20 0.424434 13 33665 6061 futex
1.09 0.004731 10 466 sched_yield
0.47 0.002038 30 67 select
0.11 0.000484 4 114 rt_sigaction
0.05 0.000203 5 41 8 rt_sigreturn
0.03 0.000128 9 15 mmap
0.02 0.000081 27 3 clone
0.01 0.000052 7 8 rt_sigprocmask
0.01 0.000032 32 1 openat
0.00 0.000011 4 3 setitimer
0.00 0.000009 5 2 sigaltstack
0.00 0.000008 8 1 munmap
0.00 0.000006 6 1 execve
0.00 0.000006 6 1 sched_getaffinity
0.00 0.000004 4 1 arch_prctl
0.00 0.000004 4 1 gettid
0.00 0.000000 0 2 2 restart_syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.432231 34392 6071 total
有 3 个 select 语句等待 3 个 channels
的代码:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
90.47 0.118614 11 10384 1333 futex
6.64 0.008704 11 791 sched_yield
2.06 0.002706 23 120 select
0.39 0.000512 4 114 rt_sigaction
0.14 0.000181 8 22 2 rt_sigreturn
0.10 0.000131 9 15 mmap
0.05 0.000060 60 1 openat
0.04 0.000057 19 3 setitimer
0.04 0.000051 17 3 clone
0.03 0.000045 6 8 rt_sigprocmask
0.01 0.000009 9 1 execve
0.01 0.000009 5 2 sigaltstack
0.01 0.000009 9 1 sched_getaffinity
0.01 0.000008 8 1 munmap
0.01 0.000007 7 1 arch_prctl
0.00 0.000005 5 1 gettid
------ ----------- ----------- --------- --------- ----------------
100.00 0.131108 11468 1335 total
从这里可以看出,futex
调用的数量与通道的数量成正比,而 futex
系统调用是性能下降的原因。
以下是对此的解释:
你可以在以下文件中找到通道的实现:src/runtime/chan.go。
这是 hchan
结构体,用于表示一个通道:
type hchan struct {
qcount uint // 队列中的数据总数
dataqsiz uint // 循环队列的大小
buf unsafe.Pointer // 指向包含 dataqsiz 个元素的数组
elemsize uint16
closed uint32
elemtype *_type // 元素类型
sendx uint // 发送索引
recvx uint // 接收索引
recvq waitq // 接收等待者列表
sendq waitq // 发送等待者列表
lock mutex
}
其中嵌入了一个 Lock 结构体,该结构体在 runtime2.go 中定义,根据操作系统的不同,可以作为互斥锁(futex
)或信号量使用。
因此,随着通道数量的增加,会有更多的 futex
系统调用,从而影响性能。
你可以阅读更多相关信息:futex(2),Channels in steroids。
英文:
Short Answer : Channels uses mutex. More channels means more futex
system calls
Here is the strace on programs .
The code with 7 select statements waiting for 7 channels
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
98.20 0.424434 13 33665 6061 futex
1.09 0.004731 10 466 sched_yield
0.47 0.002038 30 67 select
0.11 0.000484 4 114 rt_sigaction
0.05 0.000203 5 41 8 rt_sigreturn
0.03 0.000128 9 15 mmap
0.02 0.000081 27 3 clone
0.01 0.000052 7 8 rt_sigprocmask
0.01 0.000032 32 1 openat
0.00 0.000011 4 3 setitimer
0.00 0.000009 5 2 sigaltstack
0.00 0.000008 8 1 munmap
0.00 0.000006 6 1 execve
0.00 0.000006 6 1 sched_getaffinity
0.00 0.000004 4 1 arch_prctl
0.00 0.000004 4 1 gettid
0.00 0.000000 0 2 2 restart_syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.432231 34392 6071 total
The code with 3 select statements waiting for 3 channels
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
90.47 0.118614 11 10384 1333 futex
6.64 0.008704 11 791 sched_yield
2.06 0.002706 23 120 select
0.39 0.000512 4 114 rt_sigaction
0.14 0.000181 8 22 2 rt_sigreturn
0.10 0.000131 9 15 mmap
0.05 0.000060 60 1 openat
0.04 0.000057 19 3 setitimer
0.04 0.000051 17 3 clone
0.03 0.000045 6 8 rt_sigprocmask
0.01 0.000009 9 1 execve
0.01 0.000009 5 2 sigaltstack
0.01 0.000009 9 1 sched_getaffinity
0.01 0.000008 8 1 munmap
0.01 0.000007 7 1 arch_prctl
0.00 0.000005 5 1 gettid
------ ----------- ----------- --------- --------- ----------------
100.00 0.131108 11468 1335 total
As it is clear here the number of futex
calls are proportional to the number of channels and futex system calls are the reason for this performance .
Here is explanation on that
You may find the channel implementation in the following file src/runtime/chan.go .
Here is hchan
the struct for a channel
type hchan struct {
qcount uint // total data in the queue
dataqsiz uint // size of the circular queue
buf unsafe.Pointer // points to an array of dataqsiz elements
elemsize uint16
closed uint32
elemtype *_type // element type
sendx uint // send index
recvx uint // receive index
recvq waitq // list of recv waiters
sendq waitq // list of send waiters
lock mutex
}
There's a Lock embedded structure that is defined in runtime2.go and that serves as a mutex (futex
) or semaphore depending on the OS.
So with increase in number of channels more futex system call
calls be there and that would affect performance
You may read more about these in : futex(2),Channels in steroids
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论