2023年2月18日 22:37:46go评论63阅读模式

英文:

Why is reading a file asynchronously (with aiofile) so much (15x) slower than its synchronous equivalent?

问题

我正在尝试使用命名管道和async方法，并对我创建的文件进行读取的速度有点慢，正如这个问题所建议的，这种效果不仅限于下面的示例中的命名管道，也适用于'正常'文件。由于我的最终目标是读取这些命名管道，我更喜欢保留下面的示例。

以下是我最初想出的内容：

import sys, os
from asyncio import create_subprocess_exec, gather, run
from asyncio.subprocess import DEVNULL
from aiofile import async_open

async def read_strace(namedpipe):
    with open("async.log", "w") as outfp:
        async with async_open(namedpipe, "r") as npfp:
            async for line in npfp:
                outfp.write(line)

async def main(cmd):
    try:
        myfifo = os.mkfifo('myfifo', 0o600)
        process = await create_subprocess_exec(
            "strace", "-o", "myfifo", *cmd, 
            stdout=DEVNULL, stderr=DEVNULL)
        await gather(read_strace("myfifo"), process.wait())
    finally:
        os.unlink("myfifo")

run(main(sys.argv[1:]))

你可以像这样运行它：./sync_program.py <CMD> 例如 ./sync_program.py find .

这个示例使用了默认的Popen，并读取strace写入myfifo的内容：

from subprocess import Popen, DEVNULL
import sys, os

def read_strace(namedpipe):
    with open("sync.log", "w") as outfp:
        with open(namedpipe, "r") as npfp:
            for line in npfp:
                outfp.write(line)
   
def main(cmd):
    try:
        myfifo = os.mkfifo('myfifo', 0o600)
        process = Popen(
            ["strace", "-o", "myfifo", *cmd],
            stdout=DEVNULL, stderr=DEVNULL)
        read_strace("myfifo"),
    finally:
        os.unlink("myfifo")

main(sys.argv[1:])

使用time运行这两个程序显示，异步程序大约慢15倍：

$ time ./async_program.py  find .   
poetry run ./async_program.py find .  4.06s user 4.75s system 100% cpu 8.727 total
$ time ./sync_program.py find .
poetry run ./sync_program.py find .  0.27s user 0.07s system 76% cpu 0.438 total

链接的问题表明aiofile已知在某种程度上较慢，但15倍？我相当确定，通过使用额外的线程并将内容写入队列，我仍然可以接近同步方法，但诚然我还没有尝试过。

是否有一种推荐的异步读取文件的方法 - 也许甚至是更专门用于命名管道的方法，正如我在给出的示例中使用的那样？

英文:

I'm experimenting mit named pipes and async approaches and was a bit surprised, how slow reading the file I've created seems to be.

And as this question suggests, this effect is not limited to named pipes as in the example below but applies to 'normal' files as well. Since my final goal is reading those named pipes I prefer to keep the examples below.

So here is what I initially came up with:

import sys, os
from asyncio import create_subprocess_exec, gather, run
from asyncio.subprocess import DEVNULL
from aiofile import async_open

async def read_strace(namedpipe):
    with open(&quot;async.log&quot;, &quot;w&quot;) as outfp:
        async with async_open(namedpipe, &quot;r&quot;) as npfp:
            async for line in npfp:
                outfp.write(line)

async def main(cmd):
    try:
        myfifo = os.mkfifo(&#39;myfifo&#39;, 0o600)
        process = await create_subprocess_exec(
            &quot;strace&quot;, &quot;-o&quot;, &quot;myfifo&quot;, *cmd, 
            stdout=DEVNULL, stderr=DEVNULL)
        await gather(read_strace(&quot;myfifo&quot;), process.wait())
    finally:
        os.unlink(&quot;myfifo&quot;)

run(main(sys.argv[1:]))

You can run it like ./sync_program.py <CMD> e.g. ./sync_program.py find .

This one uses default Popen and reads what strace writes to myfifo:

from subprocess import Popen, DEVNULL
import sys, os

def read_strace(namedpipe):
    with open(&quot;sync.log&quot;, &quot;w&quot;) as outfp:
        with open(namedpipe, &quot;r&quot;) as npfp:
            for line in npfp:
                outfp.write(line)
   
def main(cmd):
    try:
        myfifo = os.mkfifo(&#39;myfifo&#39;, 0o600)
        process = Popen(
            [&quot;strace&quot;, &quot;-o&quot;, &quot;myfifo&quot;, *cmd],
            stdout=DEVNULL, stderr=DEVNULL)
        read_strace(&quot;myfifo&quot;),
    finally:
        os.unlink(&quot;myfifo&quot;)

main(sys.argv[1:])

Running both programs with time reveals that the async program is about 15x slower:

$ time ./async_program.py  find .   
poetry run ./async_program.py find .  4.06s user 4.75s system 100% cpu 8.727 total
$ time ./sync_program.py find .
poetry run ./sync_program.py find .  0.27s user 0.07s system 76% cpu 0.438 total

The linked question suggests that aiofile is known to be somehow slow, but 15x? I'm pretty sure that I still come close to the synchronous approach by using an extra thread and writing to a queue, but admittedly I didn't try it yet.

Is there a recommended way to read a file asynchronously - maybe even an approach more dedicated to named pipes as I use them in the given example?

答案1

得分: 1

异步并不是魔法。异步的优点在于当你在调用某个东西或有某个东西在调用你时，通常是远程的，因为存在网络、文件I/O等的I/O开销和延迟。

在你的情况下，由于只有一个进程读取一个文件（命名管道与否），不会有任何I/O等待。

所以，这里的所有异步操作只是为了将进程放入事件循环中并不断释放回事件循环，从而增加了开销。

英文:

So async isn't magic. What async is good at is when you are calling something or something is calling you, usually remotely, and there is I/O overhead and delays because of the network, file I/O, etc.

In your case, there won't be any I/O wait having a single process reading a single file (named pipe or not).

So ALL your async is doing here is ADDING overhead to the process to put it into an event loop and release back to the loop repeatedly.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

读文件异步（使用 aiofile）为什么比同步方式慢得多（15倍）？

问题

答案1

你点击了 LineCollection 中的哪一行，如何知道？

Django Channels 与 Redis 在 WSL2 中

Error with Permissions-Policy header: Origin trial controlled feature not enabled: 'interest-cohort' when using Chromedriver in headless mode

获取NumPy数组的循环的速记索引

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论