英文:
Will Go block the current thread when doing I/O inside a goroutine?
问题
我对Go如何处理非阻塞I/O感到困惑。在我看来,Go的API大多数都是同步的,当观看关于Go的演示时,经常会听到"调用会阻塞"这样的评论。
在读取文件或网络时,Go是否使用阻塞I/O?或者在goroutine内部使用时,是否有某种魔法可以重写代码?
作为一个来自C#背景的人,这感觉非常不直观,因为在C#中,我们有"await"关键字来消费异步API,它清楚地传达了API可以让出当前线程,并在稍后在一个继续中继续执行。
简而言之,当在goroutine内部进行I/O操作时,Go会阻塞当前线程吗?还是会转换为类似C#的异步/等待状态机,使用继续来实现?
英文:
I am confused over how Go handles non-blocking I/O. Go's APIs look mostly synchronous to me, and when watching presentations on Go, it's not uncommon to hear comments like "and the call blocks".
Is Go using blocking I/O when reading from files or the network? Or is there some kind of magic that re-writes the code when used from inside a goroutine?
Coming from a C# background, this feels very unintuitive, as in C# we have the await
keyword when consuming async APIs, which clearly communicates that the API can yield the current thread and continue later inside a continuation.
TLDR; will Go block the current thread when doing I/O inside a goroutine?, or will it be transformed into a C# like async/await state machine using continuations?
答案1
得分: 57
Go语言有一个调度器,它允许你编写同步代码,并自动进行上下文切换,并在底层使用异步I/O。所以,如果你运行多个goroutine,它们可能在一个系统线程上运行,当你的代码在goroutine的视图中被阻塞时,它实际上并不是真正的阻塞。这并不是魔法,但是是的,它将所有这些细节都屏蔽了。
调度器会在需要时分配系统线程,并在真正阻塞的操作(例如文件I/O或调用C代码)期间使用它们。但是,如果你正在做一些简单的HTTP服务器,你可以有成千上万个goroutine,实际上只使用了少量的“真实线程”。
你可以在这里了解更多关于Go内部工作原理的信息。
英文:
Go has a scheduler that lets you write synchronous code, and does context switching on its own and uses async I/O under the hood. So if you're running several goroutines, they might run on a single system thread, and when your code is blocking from the goroutine's view, it's not really blocking. It's not magic, but yes, it masks all this stuff from you.
The scheduler will allocate system threads when they're needed, and during operations that are really blocking (file I/O is blocking, for example, or calling C code). But if you're doing some simple http server, you can have thousands and thousands of goroutines using actually a handful of "real threads".
You can read more about the inner workings of Go here.
答案2
得分: 37
你应该先阅读@Not_a_Golfer的答案和他提供的链接,以了解goroutine是如何调度的。我的回答更像是对网络IO的深入探讨。我假设你已经了解Go如何实现协作式多任务处理。
Go可以并且确实只使用阻塞调用,因为所有的东西都在goroutine中运行,它们不是真正的操作系统线程,而是绿色线程。因此,你可以有很多goroutine都在IO调用上阻塞,它们不会像操作系统线程那样占用所有的内存和CPU资源。
文件IO只是系统调用。@Not_a_Golfer已经提到了这一点。Go将使用真正的操作系统线程来等待系统调用,并在其返回时解除goroutine的阻塞。你可以在这里看到Unix系统上的文件read
实现。
网络IO则不同。运行时使用“网络轮询器”来确定哪个goroutine应该从IO调用中解除阻塞。根据目标操作系统,它将使用可用的异步API来等待网络IO事件。这些调用看起来像是阻塞的,但内部的一切都是异步完成的。
例如,当你在TCP套接字上调用read
时,goroutine首先会尝试使用系统调用进行读取。如果还没有数据到达,它将阻塞并等待恢复。在这里,我所说的阻塞是指将goroutine放入一个队列中,等待恢复时再执行。这就是当你使用网络IO时,“被阻塞”的goroutine如何将执行让给其他goroutine的方式。
当数据到达时,网络轮询器将返回应该被恢复的goroutine。你可以在这里看到findrunnable
函数,它搜索可以运行的goroutine。它调用netpoll
函数,该函数将返回可以恢复的goroutine。你可以在这里找到netpoll
的kqueue
实现。
至于C#中的异步/等待,异步网络IO也将使用异步API(在Windows上使用IO完成端口)。当有数据到达时,操作系统将在线程池的完成端口线程上执行回调,该回调将在当前的SynchronizationContext
上放置继续执行的任务。在某种意义上,这些模型有一些相似之处(parking/unparking看起来像是在更低层次上调用继续执行),但这些模型非常不同,更不用说它们的实现方式了。默认情况下,goroutine不绑定到特定的操作系统线程,它们可以在任何一个线程上恢复执行,这并不重要。没有UI线程需要处理。异步/等待专门用于在相同的操作系统线程上使用SynchronizationContext
恢复工作。由于没有绿色线程或单独的调度器,异步/等待必须将你的函数拆分为多个在SynchronizationContext
上执行的回调函数,该上下文基本上是一个无限循环,检查应该执行的回调函数队列。你甚至可以自己实现它,非常简单。
英文:
You should read @Not_a_Golfer answer first and the link he provided to understand how goroutines are scheduled. My answer is more like a deeper dive into network IO specifically. I assume you understand how Go achieves cooperative multitasking.
Go can and does use only blocking calls because everything runs in goroutines and they're not real OS threads. They're green threads. So you can have many of them all blocking on IO calls and they will not eat all of your memory and CPU like OS threads would.
File IO is just syscalls. Not_a_Golfer already covered that. Go will use real OS thread to wait on a syscall and will unblock the goroutine when it returns. Here you can see file read
implementation for Unix.
Network IO is different. The runtime uses "network poller" to determine which goroutine should unblock from IO call. Depending on the target OS it will use available asynchronous APIs to wait for network IO events. Calls look like blocking but inside everything is done asynchronously.
For example, when you call read
on TCP socket goroutine first will try to read using syscall. If nothing is arrived yet it will block and wait for it to be resumed. By blocking here I mean parking which puts the goroutine in a queue where it awaits resuming. That's how "blocked" goroutine yields execution to other goroutines when you use network IO.
func (fd *netFD) Read(p []byte) (n int, err error) {
if err := fd.readLock(); err != nil {
return 0, err
}
defer fd.readUnlock()
if err := fd.pd.PrepareRead(); err != nil {
return 0, err
}
for {
n, err = syscall.Read(fd.sysfd, p)
if err != nil {
n = 0
if err == syscall.EAGAIN {
if err = fd.pd.WaitRead(); err == nil {
continue
}
}
}
err = fd.eofError(n, err)
break
}
if _, ok := err.(syscall.Errno); ok {
err = os.NewSyscallError("read", err)
}
return
}
https://golang.org/src/net/fd_unix.go?s=#L237
When data arrives network poller will return goroutines that should be resumed. You can see here findrunnable
function that searches for goroutines that can be run. It calls netpoll
function which will return goroutines that can be resumed. You can find kqueue
implementation of netpoll
here.
As for async/wait in C#. async network IO will also use asynchronous APIs (IO completion ports on Windows). When something arrives OS will execute callback on one of the threadpool's completion port threads which will put continuation on the current SynchronizationContext
. In a sense, there are some similarities (parking/unparking does looks like calling continuations but on a much lower level) but these models are very different, not to mention the implementations. Goroutines by default are not bound to a specific OS thread, they can be resumed on any one of them, it doesn't matter. There're no UI threads to deal with. Async/await are specifically made for the purpose of resuming the work on the same OS thread using SynchronizationContext
. And because there're no green threads or a separate scheduler async/await have to split your function into multiple callbacks that get executed on SynchronizationContext
which is basically an infinite loop that checks a queue of callbacks that should be executed. You can even implement it yourself, it's really easy.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论