问题

我在Go语言中进行了一些实验，发现了一些非常奇怪的事情。当我在我的电脑上运行以下代码时，它执行时间约为0.5秒。

package main

import (
    "fmt"
    "runtime"
    "time"
)

func waitAround(die chan bool) {
    <-die
}

func main() {
    var startMemory runtime.MemStats
    runtime.ReadMemStats(&startMemory)

    start := time.Now()
    cpus := runtime.NumCPU()
    runtime.GOMAXPROCS(cpus)
    die := make(chan bool)
    count := 100000
    for i := 0; i < count; i++ {
        go waitAround(die)
    }
    elapsed := time.Since(start)

    var endMemory runtime.MemStats
    runtime.ReadMemStats(&endMemory)

    fmt.Printf("Started %d goroutines\n%d CPUs\n%f seconds\n",
        count, cpus, elapsed.Seconds())
    fmt.Printf("Memory before %d\nmemory after %d\n", startMemory.Alloc,
        endMemory.Alloc)
    fmt.Printf("%d goroutines running\n", runtime.NumGoroutine())
    fmt.Printf("%d bytes per goroutine\n", (endMemory.Alloc-startMemory.Alloc)/uint64(runtime.NumGoroutine()))

    close(die)
}

然而，当我使用runtime.GOMAXPROCS(1)来执行它时，它执行得更快（约为0.15秒）。有人能解释一下为什么使用多个核心运行多个goroutine会更慢吗？将goroutine多路复用到多个核心上是否有显著的开销？我意识到这些goroutine并没有做任何事情，如果我必须等待这些例程实际执行某些操作，情况可能会有所不同。

英文:

I was doing some experiments in Go and I found something really odd. When I run the following code on my computer it executes in ~0.5 seconds.

package main

import (
  &quot;fmt&quot;
  &quot;runtime&quot;
  &quot;time&quot;
)
func waitAround(die chan bool) {
  &lt;- die
}
func main() {
  var startMemory runtime.MemStats
  runtime.ReadMemStats(&amp;startMemory)

  start := time.Now()
  cpus := runtime.NumCPU()
  runtime.GOMAXPROCS(cpus)
  die := make(chan bool)
  count := 100000
  for i := 0; i &lt; count; i++ {
    go waitAround(die)
  }
  elapsed := time.Since(start)

  var endMemory runtime.MemStats
  runtime.ReadMemStats(&amp;endMemory)

  fmt.Printf(&quot;Started %d goroutines\n%d CPUs\n%f seconds\n&quot;,
    count, cpus, elapsed.Seconds())
  fmt.Printf(&quot;Memory before %d\nmemory after %d\n&quot;, startMemory.Alloc,
    endMemory.Alloc)
  fmt.Printf(&quot;%d goroutines running\n&quot;, runtime.NumGoroutine())
  fmt.Printf(&quot;%d bytes per goroutine\n&quot;, (endMemory.Alloc - startMemory.Alloc)/uint64(runtime.NumGoroutine()))

  close(die)
}

However, when I execute it using runtime.GOMAXPROCS(1) it executes much faster (~0.15 seconds). Can anybody explain to me why running many goroutines would be slower using more cores? Is there any significant overhead to multiplexing the goroutines onto multiple cores? I realize the goroutines aren't doing anything and it would probably be a different story if I had to wait for the routines to actually do something.

答案1

得分: 9

当在单个核心上运行时，goroutine的分配和切换只是内部账务的问题。Goroutines从不被抢占，所以切换逻辑非常简单且非常快。在这种情况下更重要的是，您的主例程根本不会让步，因此在它们被终止之前，goroutines甚至不会开始执行。您分配结构，然后删除它，就这样。(编辑这在较新版本的go中可能不是真的，但只有1个进程时肯定更有秩序)

但是，当您在多个线程上映射例程时，突然涉及到操作系统级别的上下文切换，这会比较慢且更复杂。即使您在多个核心上运行，也有很多工作需要完成。此外，现在您的goroutines可能在程序终止之前实际运行。

尝试在两种情况下使用strace跟踪程序，并观察其行为差异。

英文:

When running on a single core, goroutine allocation and switching is just a matter of internal accounting. Goroutines are never preempted, so the switching logic is extremely simple and very fast. And more importantly in this case, your main routine does not yield at all, so the goroutines never even begin execution before they're terminated. You allocate the structure and then delete it, and that's that. (edit This may not be true with newer versions of go, but it is certainly more orderly with only 1 process)

But when you map routines over multiple threads, then you suddenly get os-level context switching involved, which is orders of magnitude slower and more complex. And even if you're on multiple cores, there's a lot more work that has to be done. Plus now your gouroutines may actually be running before the program gets terminated.

Try straceing the program under both conditions and see how its behavior differs.

答案2

得分: 4

除非您有一个受益于多核工作的重要工作负载，否则要在多个核心上衡量性能总是很困难的。问题在于代码需要在线程和核心之间共享，这意味着虽然可能没有很大的开销，但仍然有相当大的开销，尤其是对于简单的代码，降低了整体性能。

正如您提到的，如果您进行了一些需要大量CPU资源的操作，情况将完全不同。

英文:

It is always difficult to measure performance over multiple cores unless you have a significant work load that benefits from working over multiple cores. The problem is that the code needs to be shared amongst the threads and cores, which means that while there may not be huge overhead, but still a significant amount, especially for simple code, lowering the overall performance.

And like you mentioned it would be a completely different story if you did something CPU intensive.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

为什么在多核上分配goroutine的速度较慢？

问题

答案1

答案2

如何在Go SDK中指定Docker容器的网络？

谷歌文档 – 访问数据

什么构建系统适用于Go？

在同一程序中同时运行两个Web服务器。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论