读文件异步(使用 aiofile)为什么比同步方式慢得多(15倍)?

huangapple go评论63阅读模式
英文:

Why is reading a file asynchronously (with aiofile) so much (15x) slower than its synchronous equivalent?

问题

我正在尝试使用命名管道和async方法,并对我创建的文件进行读取的速度有点慢,正如这个问题所建议的,这种效果不仅限于下面的示例中的命名管道,也适用于'正常'文件。由于我的最终目标是读取这些命名管道,我更喜欢保留下面的示例。

以下是我最初想出的内容:

import sys, os
from asyncio import create_subprocess_exec, gather, run
from asyncio.subprocess import DEVNULL
from aiofile import async_open

async def read_strace(namedpipe):
    with open("async.log", "w") as outfp:
        async with async_open(namedpipe, "r") as npfp:
            async for line in npfp:
                outfp.write(line)

async def main(cmd):
    try:
        myfifo = os.mkfifo('myfifo', 0o600)
        process = await create_subprocess_exec(
            "strace", "-o", "myfifo", *cmd, 
            stdout=DEVNULL, stderr=DEVNULL)
        await gather(read_strace("myfifo"), process.wait())
    finally:
        os.unlink("myfifo")

run(main(sys.argv[1:]))

你可以像这样运行它:./sync_program.py <CMD> 例如 ./sync_program.py find .

这个示例使用了默认的Popen,并读取strace写入myfifo的内容:

from subprocess import Popen, DEVNULL
import sys, os

def read_strace(namedpipe):
    with open("sync.log", "w") as outfp:
        with open(namedpipe, "r") as npfp:
            for line in npfp:
                outfp.write(line)
   
def main(cmd):
    try:
        myfifo = os.mkfifo('myfifo', 0o600)
        process = Popen(
            ["strace", "-o", "myfifo", *cmd],
            stdout=DEVNULL, stderr=DEVNULL)
        read_strace("myfifo"),
    finally:
        os.unlink("myfifo")

main(sys.argv[1:])

使用time运行这两个程序显示,异步程序大约慢15倍:

$ time ./async_program.py  find .   
poetry run ./async_program.py find .  4.06s user 4.75s system 100% cpu 8.727 total
$ time ./sync_program.py find .
poetry run ./sync_program.py find .  0.27s user 0.07s system 76% cpu 0.438 total

链接的问题表明aiofile已知在某种程度上较慢,但15倍?我相当确定,通过使用额外的线程并将内容写入队列,我仍然可以接近同步方法,但诚然我还没有尝试过。

是否有一种推荐的异步读取文件的方法 - 也许甚至是更专门用于命名管道的方法,正如我在给出的示例中使用的那样?

英文:

I'm experimenting mit named pipes and async approaches and was a bit surprised, how slow reading the file I've created seems to be.

And as this question suggests, this effect is not limited to named pipes as in the example below but applies to 'normal' files as well. Since my final goal is reading those named pipes I prefer to keep the examples below.

So here is what I initially came up with:

import sys, os
from asyncio import create_subprocess_exec, gather, run
from asyncio.subprocess import DEVNULL
from aiofile import async_open

async def read_strace(namedpipe):
    with open(&quot;async.log&quot;, &quot;w&quot;) as outfp:
        async with async_open(namedpipe, &quot;r&quot;) as npfp:
            async for line in npfp:
                outfp.write(line)

async def main(cmd):
    try:
        myfifo = os.mkfifo(&#39;myfifo&#39;, 0o600)
        process = await create_subprocess_exec(
            &quot;strace&quot;, &quot;-o&quot;, &quot;myfifo&quot;, *cmd, 
            stdout=DEVNULL, stderr=DEVNULL)
        await gather(read_strace(&quot;myfifo&quot;), process.wait())
    finally:
        os.unlink(&quot;myfifo&quot;)

run(main(sys.argv[1:]))

You can run it like ./sync_program.py &lt;CMD&gt; e.g. ./sync_program.py find .

This one uses default Popen and reads what strace writes to myfifo:

from subprocess import Popen, DEVNULL
import sys, os

def read_strace(namedpipe):
    with open(&quot;sync.log&quot;, &quot;w&quot;) as outfp:
        with open(namedpipe, &quot;r&quot;) as npfp:
            for line in npfp:
                outfp.write(line)
   
def main(cmd):
    try:
        myfifo = os.mkfifo(&#39;myfifo&#39;, 0o600)
        process = Popen(
            [&quot;strace&quot;, &quot;-o&quot;, &quot;myfifo&quot;, *cmd],
            stdout=DEVNULL, stderr=DEVNULL)
        read_strace(&quot;myfifo&quot;),
    finally:
        os.unlink(&quot;myfifo&quot;)

main(sys.argv[1:])

Running both programs with time reveals that the async program is about 15x slower:

$ time ./async_program.py  find .   
poetry run ./async_program.py find .  4.06s user 4.75s system 100% cpu 8.727 total
$ time ./sync_program.py find .
poetry run ./sync_program.py find .  0.27s user 0.07s system 76% cpu 0.438 total

The linked question suggests that aiofile is known to be somehow slow, but 15x? I'm pretty sure that I still come close to the synchronous approach by using an extra thread and writing to a queue, but admittedly I didn't try it yet.

Is there a recommended way to read a file asynchronously - maybe even an approach more dedicated to named pipes as I use them in the given example?

答案1

得分: 1

异步并不是魔法。异步的优点在于当你在调用某个东西或有某个东西在调用你时,通常是远程的,因为存在网络、文件I/O等的I/O开销和延迟。

在你的情况下,由于只有一个进程读取一个文件(命名管道与否),不会有任何I/O等待。

所以,这里的所有异步操作只是为了将进程放入事件循环中并不断释放回事件循环,从而增加了开销。

英文:

So async isn't magic. What async is good at is when you are calling something or something is calling you, usually remotely, and there is I/O overhead and delays because of the network, file I/O, etc.

In your case, there won't be any I/O wait having a single process reading a single file (named pipe or not).

So ALL your async is doing here is ADDING overhead to the process to put it into an event loop and release back to the loop repeatedly.

huangapple
  • 本文由 发表于 2023年2月18日 22:37:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/75494055.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定