Go语言的goroutine实现原理

huangapple go评论91阅读模式
英文:

Go goroutine under the hood

问题

我正在尝试理解Golang的架构以及"轻量级线程"的含义。我已经阅读了一些内容,但想要提问以澄清一下。

如果我这样说的话,是否正确:在底层,"go"关键字只是将以下函数放入内部线程池的队列中,但对于用户来说,它看起来像是创建了一个线程?

英文:

I'm trying to understand golang architecture and what "lightweight thread" means. I've already read something, but want to ask question to clarify it.

Am I right if I'll say what "go" keyword under the hood just puts following function in queue of inner thread pool, but for user it looks like creation of thread?

答案1

得分: 4

这是从Go FAQ中复制的:

为什么使用goroutines而不是线程?

Goroutines是使并发易于使用的一部分。这个想法已经存在一段时间了,它是将独立执行的函数(协程)多路复用到一组线程上。当一个协程阻塞,比如通过调用阻塞系统调用,运行时会自动将同一操作系统线程上的其他协程移动到一个不同的可运行线程上,以避免它们被阻塞。程序员看不到这一点,这就是重点。我们称之为goroutines的结果可能非常廉价:除了用于栈的内存外,它们几乎没有额外的开销,栈的大小只有几千字节。

这里缺少的是线程的定义。如果我们求助于Wikipedia,我们可以找到:

在计算机科学中,执行线程是可以由调度程序独立管理的最小一组编程指令的序列,...

但这只是对于协程的描述,与goroutine是一样的。问题在于,线程这个词往往指的是内核线程和/或用户线程(在同一Wikipedia页面上定义),而这些线程比goroutine线程更重量级。这就让我们回到了这个问题:

我正在尝试理解golang的架构和"轻量级线程"是什么意思...

简而言之,这意味着"比操作系统提供的线程更轻量级"。这就是它的全部含义。有操作系统提供的线程(Go运行的多个操作系统上都有),但它们通常做得太多并且在切换之间的成本太高,所以Go提供了自己的语言级线程,称为"goroutines",它们要轻量得多。

从评论中:

为什么需要通过某个计划者将任务从一个线程移动到另一个线程...

这是一个实现细节,涉及到操作系统提供的内核线程的另一个方面:

我无法理解如果单个线程<s>进程</s>被长文件的系统调用阻塞,[一个goroutine]如何被抢占

当前的Go运行时goroutine / 线程 / 处理器调度器(参见https://stackoverflow.com/q/48638663/1256452,注意不仅仅有当前的实现)预测到某些系统调用将被阻塞,并确保为该系统调用分配其自己的操作系统级内核线程(另请参见JimB的评论)。这些线程不计入GOMAXPROCS设置。事实上,有时这是一个问题,因为Go运行时可能尝试启动的线程数超过了操作系统允许的数量:如果有一个系统调用线程池,那将是很好的(尽管这也存在明显的问题)。

因此,当前的运行时创建了最多GOMAXPROCS个类似内核的操作系统级线程,并使用它们将多达这么多的goroutine多路复用到CPU上,但是在需要时创建额外的内核级操作系统线程。正如上面链接的博客文章中所提到的,P实体作为队列在每个处理器上保存goroutine(G),以进行本地化的缓存查找(请记住,在某些系统上,特别是非一致性内存访问(NUMA)系统上,跨CPU进行访问是昂贵的:调度器仍然愿意这样做,但不会太频繁,对于某种"太频繁"的定义)。

当前调度器的早期版本需要显式的让出(runtime.Gosched())调用其他各种运行时操作,以导致从当前goroutine切换到其他goroutine。例如,参见https://stackoverflow.com/q/13107958/1256452。在Go 1.14中,一些操作系统提供了自动的goroutine抢占;参见https://stackoverflow.com/q/68696886/1256452。

英文:

This is copied from the Go FAQ:

> ### Why goroutines instead of threads?
>
> Goroutines are part of making concurrency easy to use. The idea, which has been around for a while, is to multiplex independently executing functions—coroutines—onto a set of threads. When a coroutine blocks, such as by calling a blocking system call, the run-time automatically moves other coroutines on the same operating system thread to a different, runnable thread so they won't be blocked. The programmer sees none of this, which is the point. The result, which we call goroutines, can be very cheap: they have little overhead beyond the memory for the stack, which is just a few kilobytes.

What's lacking here is the definition of thread. If we resort to Wikipedia, we find:

> In computer science, a thread of execution is the smallest sequence of programmed instructions that can be managed independently by a scheduler, ...

but that's just a description of, well, the same thing that a goroutine is. The problem here is that the word thread tends to refer to kernel thread and/or user thread (both defined on that same Wikipedia page) and these threads are heavier-weight than the goroutine threads. Which brings us right back to this:

> I'm trying to understand golang architecture and what "lightweight thread" means ...

To cut to the chase, this means "lighter than the OS-provided ones". That's really all it means. There are OS-provided threads (on multiple OSes on which Go runs), but they generally do too much and cost too much to switch between so Go provides its own language-level ones that it calls "goroutines" that are much lighter.

From comments:

> Why need to move tasks from one thread to another by some planner ...

This is an implementation detail, which involves another aspect of the OS-provided kernel threads:

> I can't understand how [a goroutine] can be preempted if single thread <s>process</s> [is] blocked by [a] system call to read [a] long file

The current Go runtime goroutine / thread / processor scheduler (see https://stackoverflow.com/q/48638663/1256452 and note that there have been more than just the current implementation) predicts that some system call will block, and makes sure to assign that system call its own OS-level kernel thread (see also JimB's comment). These threads do not count against the GOMAXPROCS setting. This is in fact sometimes a problem, as it's possible for the Go runtime to try to spin off more threads than the OS allows: it might be nice if there were a system-call-thread-pool here (though there are also obvious problems with this).

So, the current runtime creates up to GOMAXPROCS kernel-style OS-level threads and uses those to multiplex up to that many goroutines onto the CPUs, but creates extra kernel-style OS-level threads whenever it wants to. As the blog post linked in the question above notes, the P entities act as queues to hold goroutines (Gs) on a per-processor basis for localized cache lookup (remember that on some systems, especially NUMA ones, it's expensive to reach out "across" CPUs: the scheduler is still willing to do this, but won't do it too often, for some definition of "too often").

Earlier versions of the current scheduler required explicit yields (runtime.Gosched()) calls or various other runtime operations to cause a switch from the current goroutine to some other goroutine. See https://stackoverflow.com/q/13107958/1256452 for example. In Go 1.14, some OSes provide automatic goroutine preemption; see https://stackoverflow.com/q/68696886/1256452

huangapple
  • 本文由 发表于 2022年9月1日 06:23:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/73562498.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定