Why Golang scheduler uses two Queues (global run queue and local run queue) to manage goroutine?

huangapple go评论91阅读模式
英文:

Why Golang scheduler uses two Queues (global run queue and local run queue) to manage goroutine?

问题

我正在阅读关于Golang如何在应用程序中内部管理新创建的goroutine的内容。我了解到运行时调度器使用队列来管理创建的goroutine。

  1. 全局运行队列: 所有新创建的goroutine都被放置在这个队列中。

  2. 本地运行队列: 所有即将运行的goroutine都被分配到本地运行队列中,然后调度器会将其分配给操作系统线程。

所以,我的问题是为什么调度器要使用两个队列来管理goroutine。为什么不能只使用全局运行队列,然后调度器将其映射到操作系统线程呢?

英文:

I was reading how Golang internally manages new created goroutine in the application. And I come to know runtime scheduler use to queue to manage the created goroutines.

  1. Global run queue: All newly created goroutine is placed to this queue.

  2. Local run queue: All go routine which is about to run is allocated to local run queue and from there scheduler will assign it to OS thread.

So, Here my question is why scheduler is using two queues to manage goroutine. Why can't they just use global run queue and from there scheduler will map it to OS thread.

答案1

得分: 3

首先,请注意,这篇博客是一个非官方且旧的来源,所以其中的信息不能完全准确地反映当前版本的Go(或任何版本)。你仍然可以从中学习,但Go调度器随着时间的推移得到了改进,这可能导致信息过时。例如,博客中提到“Go调度器不是抢占式调度器,而是合作式调度器”。截至Go 1.14,这个说法已经不再正确,因为抢占已经添加到了运行时中。至于其他信息,我不能保证其准确性,但下面是他们所说的内容的解释。

阅读博客文章:

Go调度器有两个不同的运行队列:全局运行队列(GRQ)和本地运行队列(LRQ)。每个P都有一个LRQ,用于管理分配给该P上下文中要执行的Goroutine。这些Goroutine轮流在分配给该P的M上进行上下文切换。GRQ用于尚未分配给P的Goroutine。有一个过程将Goroutine从GRQ移动到LRQ,我们稍后会讨论。

这意味着GRQ用于尚未被分配运行的Goroutine,LRQ用于已被分配给P运行或已经开始执行的Goroutine。每个Goroutine都将从GRQ开始,并在稍后加入LRQ开始执行。

以下是前面引用的过程,其中Goroutine从GRQ移动到LRQ:

在图10中,P1没有更多要执行的Goroutine。但是,LRQ中和GRQ中都有处于可运行状态的Goroutine。这是P1需要窃取工作的时刻。窃取工作的规则如下。

runtime.schedule() {
    // 只有1/61的时间,检查全局可运行队列是否有G。
    // 如果没有找到,检查本地队列。
    // 如果没有找到,
    //     尝试从其他P窃取。
    //     如果没有找到,检查全局可运行队列。
    //     如果没有找到,轮询网络。
}

这意味着P将优先运行其自己的LRQ中的Goroutine,然后是其他P的LRQ,然后是GRQ,最后是网络轮询。还有一种小概率会立即运行来自GRQ的Goroutine。通过使用多个队列,可以构建这种优先级系统。

为什么我们希望有优先级来确定哪些Goroutine被运行?这可能有各种性能优势。例如,它可以更好地利用CPU缓存。如果你运行的是最近已经运行过的Goroutine,它正在处理的数据很可能仍然在CPU缓存中,使得访问速度很快。当你启动一个新的Goroutine时,它可能使用或创建尚未在缓存中的数据。然后,该数据将进入缓存,并可能驱逐另一个Goroutine正在使用的数据,这会导致该Goroutine在恢复时变慢。在极端情况下,这被称为缓存抖动,会大大降低处理器的有效速度。

使CPU缓存有效工作可能是实现高性能的最重要因素之一,但这并不是拥有这种队列系统的唯一原因。一般来说,同时运行的逻辑进程(例如Go程序中的Goroutine)越多,资源争用就越多。这是因为进程使用的资源在进程的运行时间内趋于相对稳定。换句话说,每次启动一个新进程都会增加总体资源负载,而继续运行已经启动的进程则会保持资源负载,并且完成一个进程会减少资源负载。因此,优先考虑已经运行的进程而不是新进程有助于保持资源负载在可管理的范围内。

这类似于实际建议中的“完成你开始的事情”。如果你有很多任务要完成,一个接一个地完成它们或者同时处理少数几件事情会更有效。如果你不断开始新任务而不完成之前的任务,最终你会同时进行很多事情,感到不知所措。

英文:

First, please note that this blog is an unofficial and old source, so the information in it shouldn't be taken as totally accurate with respect to the current version of Go (or any version, for that matter). You can still learn from it, but the Go scheduler is improved over time, which can make information out of date. For example, the blog says "Go scheduler is not a preemptive scheduler but a cooperating scheduler". As of Go 1.14, this is no longer true as preemption was added to the runtime. As for the other information, I won't vouch for it's accuracy, but here's an explanation of what they say.


Reading the blog post:
> There are two different run queues in the Go scheduler: the Global Run Queue (GRQ) and the Local Run Queue (LRQ). Each P is given a LRQ that manages the Goroutines assigned to be executed within the context of a P. These Goroutines take turns being context-switched on and off the M assigned to that P. The GRQ is for Goroutines that have not been assigned to a P yet. There is a process to move Goroutines from the GRQ to a LRQ that we will discuss later.

This means the GRQ is for Goroutines that haven't been assigned to run yet, the LRQ is for Goroutines that have been assigned to a P to run or have already begun executing. Each Goroutine will start on the GRQ, and join a LRQ later to begin executing.

Here is the process that the previous quote was referencing, where Goroutines are moved from the GRQ to LRQ:
> In figure 10, P1 has no more Goroutines to execute. But there are Goroutines in a runnable state, both in the LRQ for P2 and in the GRQ. This is a moment where P1 needs to steal work. The rules for stealing work are as follows.
>
> runtime.schedule() {
> // only 1/61 of the time, check the global runnable queue for a G.
> // if not found, check the local queue.
> // if not found,
> // try to steal from other Ps.
> // if not, check the global runnable queue.
> // if not found, poll network.
> }
>

This means a P will prioritize running goroutines in their own LRQ, then from other P's LRQ, then from the GRQ, then from network polling. There is also a small chance to immediately run a Goroutine from the GRQ immediately. By having multiple queues, it allows this priority system to be constructed.

Why do we want priority in which goroutines get run? It may have various performance benefits. For example, it could make better use of the CPU cache. If you run a Goroutine that was already running recently, it's more likely that the data it's working with is still in the CPU cache, making it fast to access. When you start up a new Goroutine, it may use or create data that isn't in the cache yet. That data will then enter the cache and could evict the data being used by another Goroutine, which in turn causes that Goroutine to be slower when it resumes again. In the pathological case, this is called cache thrashing, and greatly reduces the effective speed of the processor.

Allowing the CPU cache to work effectively can be one of the most important factors in achieving high performance on modern processors, but it's not the only reason to have such a queue system. In general, the more logical processes that are running at the same time (such as Goroutines in a Go program), the more resource contention will occur. This is because the resources used by a process tend to be fairly stable over the runtime of the process. In other words, every time you start a new process tends to increase the overall resource load, while continuing an already started process tends to maintain the resource load, and finishing a process tends to reduce the resource load. Therefore, prioritizing already running processes over new processes would tend to help keep the resource load in a manageable range.

It's analogous to the practical advice of "finish what you started". If you have a lot of tasks to accomplish, it's more effective to complete them one at a time, or multitask just a handful of things if you can. If you just keep starting new tasks and never finished the previous ones, eventually you have so many things going on at the same time that you feel overwhelmed.

huangapple
  • 本文由 发表于 2023年1月7日 10:33:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/75037734.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定