Golang中for循环中select语句的性能表现

huangapple go评论88阅读模式
英文:

performance of golang select statement in a for loop

问题

我进行了一个测试来评估select语句的性能,并发现结果不太好。Go版本是1.7.3。

当运行上述代码时,你会发现每次添加一个serverDone case时,CPU使用率会上升(大约5%)。当移除所有的serverDone case时,CPU使用率约为5%,这并不好。

如果我将全局锁定对象(如serverDone)改为局部对象,性能会有所提升,但仍然不够好。

谁知道我的case中是否有任何问题,或者select语句的正确用法是什么?

英文:

I make a test to see the performance of select, and found the result is not
good. The go version is 1.7.3

package main

import (
    "fmt"
    "log"
    "os"
    "runtime/pprof"
    "time"
)

var serverDone = make(chan struct{})
var serverDone1 = make(chan struct{})
var serverDone2 = make(chan struct{})
var serverDone3 = make(chan struct{})
var serverDone4 = make(chan struct{})
var serverDone5 = make(chan struct{})

func main() {
    f, err := os.Create("cpu.pprof")
    if err != nil {
        log.Fatal(err)
    }
    pprof.StartCPUProfile(f)
    defer pprof.StopCPUProfile()

    for i := 0; i < 1000; i++ {
        go messageLoop()
    }
    <-time.After(10 * time.Second)
    close(serverDone)
    fmt.Println("finished")
}

func messageLoop() {
    var ticker = time.NewTicker(100 * time.Millisecond)
    defer ticker.Stop()
    var counter = 0
    for {
        select {
        case <-serverDone:
            return
        case <-serverDone1:
            return
        // case <-serverDone2:
        //  return
        // case <-serverDone3:
        //  return
        // case <-serverDone4:
        //  return
        // case <-serverDone5:
        //  return
        case <-ticker.C:
            counter += 1
        }
    }
}

When run the above code, you will find the CPU up(in my book, about 5%) each time when a serverDone case is added.
When all of the serverDone case are removed, the CPU is about 5%, It's not good.
If I turn globally locked object(like serverDone) to locally, the performance is better, but still not good enough.

Who knows is there anything wrong in my case, or what is the correct usage of select statement?

答案1

得分: 8

简短回答:Channels 使用互斥锁(mutex)。更多的通道意味着更多的 futex 系统调用。

这是程序的 strace 输出。

有 7 个 select 语句等待 7 个 channels 的代码:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 98.20    0.424434          13     33665      6061 futex
  1.09    0.004731          10       466           sched_yield
  0.47    0.002038          30        67           select
  0.11    0.000484           4       114           rt_sigaction
  0.05    0.000203           5        41         8 rt_sigreturn
  0.03    0.000128           9        15           mmap
  0.02    0.000081          27         3           clone
  0.01    0.000052           7         8           rt_sigprocmask
  0.01    0.000032          32         1           openat
  0.00    0.000011           4         3           setitimer
  0.00    0.000009           5         2           sigaltstack
  0.00    0.000008           8         1           munmap
  0.00    0.000006           6         1           execve
  0.00    0.000006           6         1           sched_getaffinity
  0.00    0.000004           4         1           arch_prctl
  0.00    0.000004           4         1           gettid
  0.00    0.000000           0         2         2 restart_syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.432231                 34392      6071 total

有 3 个 select 语句等待 3 个 channels 的代码:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 90.47    0.118614          11     10384      1333 futex
  6.64    0.008704          11       791           sched_yield
  2.06    0.002706          23       120           select
  0.39    0.000512           4       114           rt_sigaction
  0.14    0.000181           8        22         2 rt_sigreturn
  0.10    0.000131           9        15           mmap
  0.05    0.000060          60         1           openat
  0.04    0.000057          19         3           setitimer
  0.04    0.000051          17         3           clone
  0.03    0.000045           6         8           rt_sigprocmask
  0.01    0.000009           9         1           execve
  0.01    0.000009           5         2           sigaltstack
  0.01    0.000009           9         1           sched_getaffinity
  0.01    0.000008           8         1           munmap
  0.01    0.000007           7         1           arch_prctl
  0.00    0.000005           5         1           gettid
------ ----------- ----------- --------- --------- ----------------
100.00    0.131108                 11468      1335 total

从这里可以看出,futex 调用的数量与通道的数量成正比,而 futex 系统调用是性能下降的原因。

以下是对此的解释

你可以在以下文件中找到通道的实现:src/runtime/chan.go

这是 hchan 结构体,用于表示一个通道:

type hchan struct {
    qcount   uint           // 队列中的数据总数
    dataqsiz uint           // 循环队列的大小
    buf      unsafe.Pointer // 指向包含 dataqsiz 个元素的数组
    elemsize uint16
    closed   uint32
    elemtype *_type // 元素类型
    sendx    uint   // 发送索引
    recvx    uint   // 接收索引
    recvq    waitq  // 接收等待者列表
    sendq    waitq  // 发送等待者列表
    lock     mutex
}

其中嵌入了一个 Lock 结构体,该结构体在 runtime2.go 中定义,根据操作系统的不同,可以作为互斥锁(futex)或信号量使用。

因此,随着通道数量的增加,会有更多的 futex 系统调用,从而影响性能。

你可以阅读更多相关信息:futex(2)Channels in steroids

英文:

Short Answer : Channels uses mutex. More channels means more futex system calls

Here is the strace on programs .

The code with 7 select statements waiting for 7 channels

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 98.20    0.424434          13     33665      6061 futex
  1.09    0.004731          10       466           sched_yield
  0.47    0.002038          30        67           select
  0.11    0.000484           4       114           rt_sigaction
  0.05    0.000203           5        41         8 rt_sigreturn
  0.03    0.000128           9        15           mmap
  0.02    0.000081          27         3           clone
  0.01    0.000052           7         8           rt_sigprocmask
  0.01    0.000032          32         1           openat
  0.00    0.000011           4         3           setitimer
  0.00    0.000009           5         2           sigaltstack
  0.00    0.000008           8         1           munmap
  0.00    0.000006           6         1           execve
  0.00    0.000006           6         1           sched_getaffinity
  0.00    0.000004           4         1           arch_prctl
  0.00    0.000004           4         1           gettid
  0.00    0.000000           0         2         2 restart_syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.432231                 34392      6071 total

The code with 3 select statements waiting for 3 channels

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 90.47    0.118614          11     10384      1333 futex
  6.64    0.008704          11       791           sched_yield
  2.06    0.002706          23       120           select
  0.39    0.000512           4       114           rt_sigaction
  0.14    0.000181           8        22         2 rt_sigreturn
  0.10    0.000131           9        15           mmap
  0.05    0.000060          60         1           openat
  0.04    0.000057          19         3           setitimer
  0.04    0.000051          17         3           clone
  0.03    0.000045           6         8           rt_sigprocmask
  0.01    0.000009           9         1           execve
  0.01    0.000009           5         2           sigaltstack
  0.01    0.000009           9         1           sched_getaffinity
  0.01    0.000008           8         1           munmap
  0.01    0.000007           7         1           arch_prctl
  0.00    0.000005           5         1           gettid
------ ----------- ----------- --------- --------- ----------------
100.00    0.131108                 11468      1335 total

As it is clear here the number of futex calls are proportional to the number of channels and futex system calls are the reason for this performance .

Here is explanation on that

You may find the channel implementation in the following file src/runtime/chan.go .

Here is hchan the struct for a channel

type hchan struct {
	qcount   uint           // total data in the queue
	dataqsiz uint           // size of the circular queue
	buf      unsafe.Pointer // points to an array of dataqsiz elements
	elemsize uint16
	closed   uint32
	elemtype *_type // element type
	sendx    uint   // send index
	recvx    uint   // receive index
	recvq    waitq  // list of recv waiters
	sendq    waitq  // list of send waiters
	lock     mutex
}

There's a Lock embedded structure that is defined in runtime2.go and that serves as a mutex (futex) or semaphore depending on the OS.

So with increase in number of channels more futex system call calls be there and that would affect performance

You may read more about these in : futex(2),Channels in steroids

huangapple
  • 本文由 发表于 2017年2月6日 11:03:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/42059800.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定