英文:
Avoid duplicating sync and async code in Rust library
问题
我最近发现了一个提供同步和异步接口的库。异步功能可以通过async
特性标志来启用,编译器指令用于区分异步和同步函数。
例如,这是一个同步函数的样子:
#[cfg(not(feature = "async"))]
fn perform_query<A: ToSocketAddrs>(&self, payload: &[u8], addr: A) -> Result<Vec<u8>> {
// 包含超过100行的代码,偶尔调用同步的 UdpSocket::send_to 和 recv。
}
这是一个异步函数的样子:
#[cfg(feature = "async")]
async fn perform_query<A: ToSocketAddrs>(&self, payload: &[u8], addr: A) -> Result<Vec<u8>> {
// 包含超过100行的代码,偶尔调用异步的 UdpSocket::send_to 和 recv。
// 除了3-4个`await`语句,它基本上和同步版本做的事情一样。
}
我在同步代码中找到并修复了一些错误,现在我打算将修复应用到异步代码中。但后来我注意到,由于这个大函数完全重复,我需要将修复补丁应用到异步函数中。然后我开始思考,为什么这个函数的大部分内容一开始就重复了呢?长期来看,维护这段代码似乎很困难,所以我想通过消除函数的重复来帮忙……然后我遇到了一些问题,让我意识到这并不像我想象的那么简单。我可以使用编译器指令来区分那几行代码,甚至可以编写一个宏,根据async
特性是否启用来插入UdpSocket
调用的同步/异步版本。但后来我意识到,我无法通过编译器指令选择函数头部,因为#[cfg...]
会应用于整个函数,因此如果我像下面这样做,会产生大量的语法错误:
#[cfg(not(feature = "async"))]
fn perform_query<A: ToSocketAddrs>(&self, payload: &[u8], addr: A) -> Result<Vec<u8>>
#[cfg(feature = "async")]
async fn perform_query<A: ToSocketAddrs>(&self, payload: &[u8], addr: A) -> Result<Vec<u8>> {
// 消除重复的代码,偶尔区分同步/异步的 UdpSocket 调用。
}
我还考虑过只在核心中保留异步函数,然后编写异步和同步包装函数,根据库是作为同步还是异步编译来调用它,但然后我无法从同步函数中调用异步函数,或者至少需要通过异步运行时进行一些丑陋的操作,以await
/poll
函数然后将结果传递为同步,但这样同步库构建也必须导入异步运行时,最好避免这种情况。
我当前的想法是将数据包的处理移到单独的同步函数中,然后从同步和异步包装函数中调用它们,这些包装函数只处理实际的UdpSocket
调用,但我不确定这是否是正确的方法。我的意思是,有没有更流畅、更优雅的方法?这个问题的一般解决方法是什么?或者复制庞大的函数以用于同步和异步构建是正常的吗?正如你所猜测的,我没有异步编程的经验。
英文:
I've recently came across a library that provides both sync and async interfaces. Async can be enabled with the async
feature flag and the async/sync functions are distinguished with compiler directives.
E.g. here's how a sync function looks like:
#[cfg(not(feature = "async"))]
fn perform_query<A: ToSocketAddrs>(&self, payload: &[u8], addr: A) -> Result<Vec<u8>>
{
// More than 100 lines of code with occasional calls to sync UdpSocket::send_to and recv.
}
And this is how an async function looks like:
#[cfg(feature = "async")]
async fn perform_query<A: ToSocketAddrs>(&self, payload: &[u8], addr: A) -> Result<Vec<u8>>
{
// More than 100 lines of code with occasional calls to async UdpSocket::send_to and recv.
// Apart from 3-4 await lines, it does mostly the same thing as its sync counterpart.
}
I found and fixed some bugs in the sync code and now I'm about to implement the fix to the async code as well. But then I noticed that since this large function is entirely duplicated, I'd need to patch my fixes into the async function, and then I started to think, why most of this function is duplicated in the first place? It seems like hell to maintain this code in the long run, so I thought to do a favor by deduplicating this function... Then I run into issues those let me aware that it's not as trivial as I thought. I can sure differentiate those few lines with compiler directives and I could even write a macro which would insert the sync / async versions of the UdpSocket
calls depending on whether the async
feature enabled. But then I realized I can't select the function headers via compiler directives, because #[cfg...]
would apply to the entire function, so if I do something like this, I get massive syntax errors:
#[cfg(not(feature = "async"))]
fn perform_query<A: ToSocketAddrs>(&self, payload: &[u8], addr: A) -> Result<Vec<u8>>
#[cfg(feature = "async")]
async fn perform_query<A: ToSocketAddrs>(&self, payload: &[u8], addr: A) -> Result<Vec<u8>>
{
// Deduplicated code with occasional differentiation of sync / async UdpSocket calls.
}
I also thought of having only the async function of the core and then async and sync wrapper functions to call it whether the library is being compiled as sync or async, but then I can't call an async function from a sync function, or at least I'd need to do some ugly magic using an async runtime to await
/ poll
the function and then pass the result as sync, but then the sync build of the library would also have to import an async runtime anyway, which would be better to be avoided.
My current idea is to move the processing of the packets into separate sync functions those would be called from sync and async wrappers those would only deal with the actual UdpSocket
calls, but I'm not sure if that's the right way to do that. I mean, isn't there a smoother, more elegant way? What is the general approach for this? Or is it normal to duplicate whopping functions for sync and async builds? As you may guess, I have no experience with async programming.
答案1
得分: 5
这是reqwest
提供其阻塞接口的方式。我认为这是一个非常好的做法,如果库足够大,异步运行时不会增加太多的编译成本。 它的优点在于,在所有情况下,所有的IO都以完全相同的方式工作,减少了潜在的错误的机会。
我目前的想法是将数据包的处理移到单独的同步函数中,这些函数将从同步和异步包装函数中调用,这些函数只处理实际的
UdpSocket
调用。
我建议您选择这个选项 — 将算法与IO分开。 它除了您当前的代码去重外,还有其他优点:
-
当您可以将算法表达为简单的函数调用时,编写数据包算法的单元测试可能更容易 — 尤其是处理IO中的边缘情况 — 而不必为测试任何内容还要设置一个对等的UDP套接字。
-
如果将算法设为公开,它们可以在不与操作系统的网络堆栈进行交互的情况下使用,比如:
no_std
环境,其中网络是自定义的,不为Rust的std
或异步IO库所知- 流量捕获的分析(非实时)
- 重新实现IO部分以满足特殊需求(例如传递特定的标志给操作系统),同时仍然可以使用您的库的算法
-
错误处理,作为IO的必要部分,如果与算法相互交织较少,可能会更清晰,因为这样的拆分将需要。
这种库设计风格有时被称为“sans I/O”(至少在Python程序员中是这样)。您可以在Rust中看到这一点,例如 http
库提供HTTP解析算法,但不提供任何IO功能。
英文:
> I also thought of having only the async function of the core and then async and sync wrapper functions to call it … then the sync build of the library would also have to import an async runtime anyway …
This is how reqwest
offers its blocking interface. I think it's a perfectly good way to do things, if the library is big enough that the async runtime is not a large additional compilation cost. It has the advantage that all of your IO works exactly the same way in all cases, reducing the chances of subtle bugs.
> My current idea is to move the processing of the packets into separate sync functions those would be called from sync and async wrappers those would only deal with the actual UdpSocket
calls
I recommend that you take this option — separating the algorithms from the IO. It has advantages beyond the code de-duplication you are currently aiming for:
-
It is likely easier to write unit tests for the packet algorithms when you can express them as simple function calls — especially ones that handle edge cases in IO — than if you have to also set up a peer UDP socket to test anything.
-
If you make the algorithms public, it allows them to be used in unusual situations, such as ones not interacting with the operating system's networking stack:
no_std
environments where the networking is custom and not known to Ruststd
or your async IO library- analysis of captures of traffic (non-real-time)
- reimplementing the IO side for special requirements (e.g. passing specific flags to the OS) while still being able to use your library's algorithms
-
Error handling, a necessary part of IO, may be clearer if it is less intertwined with the algorithms, as such a split would require
This style of library design is sometimes called “sans I/O” (at least by Python programmers). You can see it in Rust with, for example, the http
library which provides HTTP parsing algorithms but no IO whatsoever.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论