如何检测阻止Golang使用多个核心的问题?

huangapple go评论85阅读模式
英文:

How to detect what is preventing multiple cores being used in golang?

问题

所以,我有一段并发的代码,它应该在每个CPU/核心上运行。

有两个包含输入/输出值的大型向量

var (
    input = make([]float64, rowCount)
    output = make([]float64, rowCount)
)

这些向量已经填充好了,我想计算每个输入-输出对之间的距离(误差)。由于这些对是独立的,可能的并发版本如下所示:

var d float64 // 要计算的误差
// 为每个CPU设置一个工作线程
ch := make(chan float64)
nw := runtime.NumCPU()
for w := 0; w < nw; w++ {
    go func(id int) {
         var wd float64
         // 例如,nw = 4
         // worker0, i = 0, 4, 8, 12...
         // worker1, i = 1, 5, 9, 13...
         // worker2, i = 2, 6, 10, 14...
         // worker3, i = 3, 7, 11, 15...
         for i := id; i < rowCount; i += nw {
             res := compute(input[i])
             wd += distance(res, output[i])
         }
         ch <- wd
    }(w)
}
// 计算总距离
for w := 0; w < nw; w++ {
    d += <-ch
}

这个想法是为每个CPU/核心设置一个单独的工作线程,每个工作线程处理一部分行。

我遇到的问题是,这段代码的运行速度并不比串行代码快。

我正在使用Go 1.7,所以runtime.GOMAXPROCS应该已经设置为runtime.NumCPU(),但即使显式设置它也不能提高性能。

  • distance只是(a-b)*(a-b)
  • compute稍微复杂一些,但应该是可重入的,并且只用于读取全局数据(并使用math.Powmath.Sqrt函数);
  • 没有其他goroutine在运行。

所以,除了读取全局数据(input/output),我不知道是否还有其他锁/互斥体(例如,没有使用math/rand)。

我还使用了-race进行了编译,但没有发现任何问题。

我的主机有4个虚拟核心,但当我运行这段代码时,CPU使用率为102%,但我预期应该在380%左右,因为在过去使用其他使用所有核心的Go代码时是这样的。

我想进行调查,但我不知道运行时如何分配线程和调度goroutine。

我该如何调试这种问题?pprof能帮助我吗?runtime包呢?

提前感谢。

英文:

So, I have a piece of code that is concurrent and it's meant to be run onto each CPU/core.

There are two large vectors with input/output values

var (
    input = make([]float64, rowCount)
    output = make([]float64, rowCount)
)

these are filled and I want to compute the distance (error) between each input-output pair. Being the pairs independent, a possible concurrent version is the following:

var d float64 // Error to be computed
// Setup a worker &quot;for each CPU&quot;
ch := make(chan float64)
nw := runtime.NumCPU()
for w := 0; w &lt; nw; w++ {
    go func(id int) {
         var wd float64
         // eg nw = 4
         // worker0, i = 0, 4, 8, 12...
         // worker1, i = 1, 5, 9, 13...
         // worker2, i = 2, 6, 10, 14...
         // worker3, i = 3, 7, 11, 15...
         for i := id; i &lt; rowCount; i += nw {
             res := compute(input[i])
             wd += distance(res, output[i])
         }
         ch &lt;- wd
    }(w)
}
// Compute total distance
for w := 0; w &lt; nw; w++ {
    d += &lt;-ch
}

The idea is to have a single worker for each CPU/core, and each worker processes a subset of the rows.

The problem I'm having is that this code is no faster than the serial code.

Now, I'm using Go 1.7 so runtime.GOMAXPROCS should be already set to runtime.NumCPU(), but even setting it explicitly does not improves performances.

  • distance is just (a-b)*(a-b);
  • compute is a bit more complex, but should be reentrant and use global data only for reading (and uses math.Pow and math.Sqrt functions);
  • no other goroutine is running.

So, besides accessing the global data (input/output) for reading, there are no locks/mutexes that I am aware of (not using math/rand, for example).

I also compiled with -race and nothing emerged.

My host has 4 virtual cores, but when I run this code I get (using htop) CPU usage to 102%, but I expected something around 380%, as it happened in the past with other go code that used all the cores.

I would like to investigate, but I don't know how the runtime allocates threads and schedule goroutines.

How can I debug this kind of issues? Can pprof help me in this case? What about the runtime package?

Thanks in advance

答案1

得分: 1

抱歉,但最后我测量错误了。@JimB是正确的,我有一个小漏洞,但不足以证明这种程度的减速。

我的期望值太高了:我并发执行的函数只在程序开始时调用,因此性能改进只是微小的。

在将该模式应用于程序的其他部分后,我得到了预期的结果。我错误地评估了哪个部分最重要。

无论如何,与此同时,我学到了很多有趣的东西,所以非常感谢所有试图帮助我的人!

英文:

Sorry, but in the end I got the measurement wrong. @JimB was right, and I had a minor leak, but not so much to justify a slowdown of this magnitude.

My expectations were too high: the function I was making concurrent was called only at the beginning of the program, therefore the performance improvement was just minor.

After applying the pattern to other sections of the program, I got the expected results. My mistake in evaluation which section was the most important.

Anyway, I learned a lot of interesting things meanwhile, so thanks a lot to all the people trying to help!

huangapple
  • 本文由 发表于 2017年2月22日 04:30:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/42377346.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定