Kotlin – 单线程环境中的挂起函数

huangapple go评论71阅读模式
英文:

Kotlin - Suspend functions in one threaded environments

问题

我不确定我的suspend的心智模型是否正确。根据我所了解,似乎意味着一个(长时间运行的)suspend函数如果其内部的另一个函数标记为suspend(这会为父函数生成悬挂点),那么它可以被挂起。

为了简单起见,让我们假设在没有异步编程的单线程环境中:

launch { //<--- 创建一个协程,其中我们可以使用suspend函数
    fetchUserData("Jon") 
}

// 以及以下函数:

suspend fun fetchUserData(userName: String) {
    makeLongRunningNetworkCall(userName) //<---- 挂起fetchUserData()
}

suspend fun makeLongRunningNetworkCall(userName: String) {...}

我理解的是,makeLongRunningNetworkCall() "将" fetchUserData() "从" 线程中移出,这样在等待makeLongRunningNetworkCall()的结果时不会阻塞其他计算。

但如果没有东西来挂起makeLongRunningNetworkCall(),线程是否仍然被makeLongRunningNetworkCall()阻塞?我的意思是,等待网络结果必须在某个地方完成,否则可能会错过结果?

所以对我来说,在这种情况下,suspend只有在fetchUserData()makeLongRunningNetworkCall()运行在不同的线程上时才有意义,这样makeLongRunningNetworkCall()会告诉其父函数回家并释放其线程,直到收到结果为止,对吗?

我的理解正确吗?或者suspend更多地是指整个协程被移出线程?但再次,谁能保证捕获网络调用的响应呢?

英文:

I'm not entirely sure if my mental model of suspend is correct. From what I gathered it seems to mean that a (long running) suspend function can be suspended if another function inside it is marked with suspend (which generates a suspension point for the parent function).

To keep it simple lets assume a one threaded environment without asynchronous programming.

launch { //<--- creates a coroutine in which we can use suspend functions
    fetchUserData("Jon") 
}


// and following functions:

suspend fun fetchUserData(userName: String) {
    makeLongRunningNetworkCall(userName) //<---- suspends fetchUserData()
}

suspend fun makeLongRunningNetworkCall(userName: String) {...}

My understanding is, that makeLongRunningNetworkCall() "takes" fetchUserData() "off" the thread so it doesn't block other computations while waiting for the results of makeLongRunningNetworkCall().

But if nothing suspends makeLongRunningNetworkCall() isn't the thread still blocked by makeLongRunningNetworkCall()? I mean the "waiting" for the network result has to be done somewhere or else the result might be missed?

So for me suspend in that case would just make sense if fetchUserData() and makeLongRunningNetworkCall() would run on different threads, so that makeLongRunningNetworkCall() tells its parent function to go home and free its thread until it received a result?!

Is my understanding correct? Or does suspend rather mean the whole coroutine is taken off the thread? But then again, who assures the response of the network call is captured?

答案1

得分: 1

一个挂起函数只有在它调用的其他挂起函数也挂起时才会实际挂起,以此类推。

这也是为什么按照惯例,你绝不能在协程或挂起函数中直接调用阻塞函数的一个原因。标准库中的所有挂起函数都遵循这个约定。(另一个重要原因是简单性。我们从来不必担心挂起函数可能也是阻塞函数,从而占用了不应该占用的线程。)

处理阻塞调用的常见模式是将其包装在 withContext 中。withContext 是一个挂起函数,它在运行其代码时暂停,使用可能能够处理阻塞调用的 CoroutineContext。你可以将其与 Dispatchers.DefaultDispatchers.IO 一起使用,以使在 withContext lambda 中调用阻塞函数成为可能。

如果你正在使用诸如 Retrofit、Jetpack Room、Google Firebase 等常用库,它们会公开可以信任不会阻塞的公共 suspend 函数,这是按照约定所需的。由于大多数这些库都支持Java和Kotlin,它们实际上是在底层使用自己的线程池,而不是Kotlin协程的调度器池。它们通过使用低级别的 suspendCancellableCoroutine 挂起函数来实现这一点,该函数允许对协程的挂起和恢复方式进行更精细的控制。

另一个值得一提的方面,尽管你没有问到,那就是挂起函数是否支持取消。通常,如果可能的话,我们希望支持取消。所有标准库的挂起函数都支持取消。如果你将一个长时间的阻塞计算包装在 withContext 中,仅此还不足以支持取消。你还必须在你的阻塞代码中插入一些挂起调用或 if (isActive) 检查,以便在需要在完成之前停止时有机会中断和取消它。

英文:

A suspend function only actually suspends if the suspend functions that it calls also suspend, and so on down the chain.

This is one reason why, by convention, you must never directly call a blocking function in a coroutine or suspend function. All of the suspend functions in the standard library follow this convention. (The other big reason is simplicity. We don't ever have to worry about whether a suspend function might also be a blocking function and tie up a thread that it shouldn't be.)

A common pattern for handling blocking calls is to wrap it in withContext. withContext is a suspend function that suspends while it runs its code using a CoroutineContext that may be able to handle blocking calls. You can use it with Dispatchers.Default or Dispatchers.IO as appropriate to make it permissible to call blocking functions within the withContext lambda.

If you are using popular libraries such as Retrofit, Jetpack Room, Google Firebase, etc., they expose public suspend functions that you can trust do not block, as required by convention. Since most of these libraries support both Java and Kotlin, they are actually using their own thread pools under the hood rather than Kotlin Coroutines Dispatcher pools. They achieve this by using the low level suspendCancellableCoroutine suspend function, which allows finer control of exactly how the coroutine is suspended and resumed.


The other aspect of this worth mentioning although you didn't ask about it, is whether the suspend function supports cancellation. Usually, we want to support cancellation if possible. All the standard library suspend functions do. If you are wrapping a long blocking calculation in withContext, that's not enough to support cancellation. You must also intersperse some suspending calls or if (isActive) checks within your blocking code to give it opportunities to be interrupted and cancelled if you want it to be able to be stopped before finishing.

答案2

得分: 0

我的理解是正确的吗?或者“挂起”是否意味着整个协程被移出线程?

更确切地说是第二种情况,但为了理解其中的区别,您需要在fetchUserData中添加一些代码。例如,考虑以下代码:

suspend fun fetchUserData(userName: String): UserData {
    val userData = makeLongRunningNetworkCall(userName)
    return useData
}

如果makeLongRunningNetworkCall挂起,那么在执行其余代码(return)之前,fetchUserData确实需要等待它恢复。同样,调用fetchUserData的调用者也需要等待,依此类推。这就是为什么很容易理解“挂起”函数的原因 - 从这个意义上讲,它们实际上是按顺序运行的。

因此,考虑到这一点,整个协程都会被挂起(从最初的协程构建器launch一直到整个堆栈),因为在makeLongRunningNetworkCall恢复之前,执行堆栈中的任何内容都不会继续进行。

但是,谁来保证网络调用的响应被捕获呢?

这是一个很好的问题。以上所有内容实际上都是基于这样一个假设开始的,即makeLongRunningNetworkCall 会挂起。这不是一个抽象的概念。具体来说,这意味着该函数将返回(实际上返回)一个特殊的标记称为COROUTINE_SUSPENDED,因此整个挂起机制发生,线程开始执行其他操作。

这意味着这个函数必须是协作性的,并在可能的情况下挂起。如果函数实际上在网络调用上阻塞,那么它实际上不会挂起,而会像您猜测的那样阻塞线程。当这样的函数实际挂起时,通常意味着它们已将其阻塞工作转移到另一个线程,或者它们真正是非阻塞的(例如,基于回调的) - 但有时这只是深层次的任务迁移。

为了理解和解密它是如何工作的,我建议阅读这篇关于kotlinx.coroutines如何构建的精彩文章:
https://blog.kotlin-academy.com/kotlin-coroutines-animated-part-1-coroutine-hello-world-51797d8b9cd4

英文:

> Is my understanding correct? Or does suspend rather mean the whole coroutine is taken off the thread?

Rather the second, but in order to see the difference, you'll need to add some code in fetchUserData. For instance, consider:

suspend fun fetchUserData(userName: String): UserData {
    val userData = makeLongRunningNetworkCall(userName)
    return useData
}

If makeLongRunningNetworkCall suspends, then fetchUserData does need to wait for it to resume before executing the rest of its code (the return). Similarly, the caller of fetchUserData also needs to wait, etc. This is why it's easy to reason about suspend functions - they actually run sequentially in that sense.

So, with that in mind, the whole coroutine is suspended (the whole stack up to the initial coroutine builder launch), because nothing in the execution stack will carry on until makeLongRunningNetworkCall resumes.

> But then again, who assures the response of the network call is captured?

This is a good question. All of the above actually started with the assumption that makeLongRunningNetworkCall does suspend. This is not an abstract concept. What it means concretely is that the function will return (as in, actually return) a special token called COROUTINE_SUSPENDED, so the whole suspension mechanism happens, and the thread starts executing something else.

This means that this function must be cooperative and suspend when it's possible. If the function actually blocks on a network call, then it doesn't actually suspend, and it really does block the thread as you guessed. When functions like this actually suspend, it often means that they offloaded their blocking work to another thread, or that they are truly non-blocking (e.g. callback-based) - but sometimes that just means the offloading happens deeper.

To understand and demystify how this works, I suggest reading this nice article about how kotlinx.coroutines is built on just a couple compiler built-ins:
https://blog.kotlin-academy.com/kotlin-coroutines-animated-part-1-coroutine-hello-world-51797d8b9cd4

答案3

得分: 0

异步编程可以在单线程环境下实现,例如Python中的asyncio和Node.js。如何实现呢?协程在单线程上轮流执行(以微秒/毫秒为单位),给出并发的感觉,但它们不是真正的并发。对于IO密集型任务仍然非常有用,因为它们触发一些工作然后等待结果。对于CPU密集型任务,单线程不足以实现并发。

当协程被挂起时,它完全被从线程中移除,并释放线程。稍后,当协程被恢复时,它可能会在不同的线程上恢复。

但再次,谁来确保网络调用的响应被捕获?我的意思是必须在某个地方进行“等待”网络结果,否则可能会错过结果?

不需要线程来等待结果,否则我们只会将工作转移到每个库的专用线程池(网络IO、数据库、文件IO等)并创建另一层抽象。然后,如果每个库都有独立的线程(每次读/写请求都有一个线程)等待结果,那么就不会有真正的好处。

那么如何等待结果而不占用线程呢?

简化的解释是IO库调用(直接/间接)操作系统的底层函数,这些函数在某个时刻调用设备驱动程序。设备驱动程序立即返回给操作系统,请求现在是异步执行的。

不管是什么类型的I/O请求,在应用程序的代表上发出的I/O操作都是异步执行的;也就是说,一旦I/O请求被启动,设备驱动程序就会返回给I/O系统。I/O系统是否立即返回给调用方取决于句柄是否为同步或异步I/O打开。

操作系统返回给库,库以某种形式(回调/Future/Observable等)返回给调用方,调用方不会被阻塞。没有线程在等待结果。

请求开始后一段时间,设备完成处理请求。它通过中断通知CPU...设备驱动程序的中断服务例程(ISR)响应中断...然后排队Deferred Procedure Call(DPC)...DPC获取表示初始请求的IRP并将其标记为“完成”。但这种“完成”状态仅在操作系统级别存在。操作系统将Asynchronous Procedure Call(APC)排队到拥有设备底层句柄的线程...因为库/BCL已经将句柄注册到了I/O完成端口(IOCP),它是线程池的一部分。因此,短暂地借用了I/O线程池线程来执行APC,通知任务已完成。

在请求飞行期间没有线程。当请求完成时,各种线程被“借用”或者被短暂地排队工作。这个工作通常在毫秒级别左右(例如,在线程池上运行的APC)到微秒级别左右(例如,ISR)。但没有线程被阻塞,只是在等待请求完成时等待。

英文:

> Lets assume a one threaded environment without asynchronous programming

Asynchronous programming can be achieved even on single threaded environment, example: asyncio in Python and node.js. How? Coroutines just take turns (in micro/milliseconds) on a single thread and give notion of concurrency, but they are not truly concurrent. This is still very useful for IO-bound tasks, because they trigger some work and wait for the result. For CPU-bound task, single thread is not enough/possible for concurrency.

> But if nothing suspends makeLongRunningNetworkCall() isn't the thread still blocked by makeLongRunningNetworkCall()? Or does suspend rather mean the whole coroutine is taken off the thread?

When coroutine is suspended, it is completely taken off the thread, and frees it. Later on when coroutine is resumed, it might resume on different thread.

> But then again, who assures the response of the network call is captured? I mean the "waiting" for the network result has to be done somewhere or else the result might be missed?

No thread is required to wait for the result, otherwise we would be just offloading works to dedicated Thread Pools of each library (network io, database, file io, ...) and create another layer of abstraction. Then there wouldn't be real benefit if each library has separate thread (per read/write request) that waits for the result.

So how to await result without sitting on a thread?

Simplified explanation is io-library calls (directly/indirectly) OS's low level functions, which calls device driver at some point. Device driver returns to OS immediately and request is now in flight and performed asynchronously.

> Regardless of the type of I/O request, internally I/O operations issued to a driver on behalf of the application are performed asynchronously; that is, once an I/O request has been initiated, the device driver returns to the I/O system. Whether or not the I/O system returns immediately to the caller depends on whether the handle was opened for synchronous or asynchronous I/O.

The OS returns to the library, which returns to the caller in some form (callback/Future/Observable etc) and caller is not blocked. No thread is waiting for the result.

> Some time after the request started, the device finishes handling request. It notifies the CPU via an interrupt... The device driver’s Interrupt Service Routine (ISR) responds to the interrupt... then Deferred Procedure Call (DPC) is queued... The DPC takes the IRP representing the initial request and marks it as “complete”. However, that “completion” status only exists at the OS level. OS queues a Asynchronous Procedure Call (APC) to the thread owning the device’s underlying HANDLE... Since Library/BCL has already registered the handle with the I/O Completion Port (IOCP), which is part of the thread pool. So an I/O thread pool thread is borrowed briefly to execute the APC, which notifies the task that it’s complete.

> There was no thread while the request was in flight. When the request completed, various threads were “borrowed” or had work briefly queued to them. This work is usually on the order of a millisecond or so (e.g., the APC running on the thread pool) down to a microsecond or so (e.g., the ISR). But there is no thread that was blocked, just waiting for that request to complete.

huangapple
  • 本文由 发表于 2023年7月10日 21:17:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/76654148.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定