英文:
Why is reading a file asynchronously (with aiofile) so much (15x) slower than its synchronous equivalent?
问题
我正在尝试使用命名管道和async
方法,并对我创建的文件进行读取的速度有点慢,正如这个问题所建议的,这种效果不仅限于下面的示例中的命名管道,也适用于'正常'文件。由于我的最终目标是读取这些命名管道,我更喜欢保留下面的示例。
以下是我最初想出的内容:
import sys, os
from asyncio import create_subprocess_exec, gather, run
from asyncio.subprocess import DEVNULL
from aiofile import async_open
async def read_strace(namedpipe):
with open("async.log", "w") as outfp:
async with async_open(namedpipe, "r") as npfp:
async for line in npfp:
outfp.write(line)
async def main(cmd):
try:
myfifo = os.mkfifo('myfifo', 0o600)
process = await create_subprocess_exec(
"strace", "-o", "myfifo", *cmd,
stdout=DEVNULL, stderr=DEVNULL)
await gather(read_strace("myfifo"), process.wait())
finally:
os.unlink("myfifo")
run(main(sys.argv[1:]))
你可以像这样运行它:./sync_program.py <CMD>
例如 ./sync_program.py find .
这个示例使用了默认的Popen
,并读取strace
写入myfifo
的内容:
from subprocess import Popen, DEVNULL
import sys, os
def read_strace(namedpipe):
with open("sync.log", "w") as outfp:
with open(namedpipe, "r") as npfp:
for line in npfp:
outfp.write(line)
def main(cmd):
try:
myfifo = os.mkfifo('myfifo', 0o600)
process = Popen(
["strace", "-o", "myfifo", *cmd],
stdout=DEVNULL, stderr=DEVNULL)
read_strace("myfifo"),
finally:
os.unlink("myfifo")
main(sys.argv[1:])
使用time
运行这两个程序显示,异步程序大约慢15倍:
$ time ./async_program.py find .
poetry run ./async_program.py find . 4.06s user 4.75s system 100% cpu 8.727 total
$ time ./sync_program.py find .
poetry run ./sync_program.py find . 0.27s user 0.07s system 76% cpu 0.438 total
链接的问题表明aiofile
已知在某种程度上较慢,但15倍?我相当确定,通过使用额外的线程并将内容写入队列,我仍然可以接近同步方法,但诚然我还没有尝试过。
是否有一种推荐的异步读取文件的方法 - 也许甚至是更专门用于命名管道的方法,正如我在给出的示例中使用的那样?
英文:
I'm experimenting mit named pipes and async
approaches and was a bit surprised, how slow reading the file I've created seems to be.
And as this question suggests, this effect is not limited to named pipes as in the example below but applies to 'normal' files as well. Since my final goal is reading those named pipes I prefer to keep the examples below.
So here is what I initially came up with:
import sys, os
from asyncio import create_subprocess_exec, gather, run
from asyncio.subprocess import DEVNULL
from aiofile import async_open
async def read_strace(namedpipe):
with open("async.log", "w") as outfp:
async with async_open(namedpipe, "r") as npfp:
async for line in npfp:
outfp.write(line)
async def main(cmd):
try:
myfifo = os.mkfifo('myfifo', 0o600)
process = await create_subprocess_exec(
"strace", "-o", "myfifo", *cmd,
stdout=DEVNULL, stderr=DEVNULL)
await gather(read_strace("myfifo"), process.wait())
finally:
os.unlink("myfifo")
run(main(sys.argv[1:]))
You can run it like ./sync_program.py <CMD>
e.g. ./sync_program.py find .
This one uses default Popen
and reads what strace
writes to myfifo
:
from subprocess import Popen, DEVNULL
import sys, os
def read_strace(namedpipe):
with open("sync.log", "w") as outfp:
with open(namedpipe, "r") as npfp:
for line in npfp:
outfp.write(line)
def main(cmd):
try:
myfifo = os.mkfifo('myfifo', 0o600)
process = Popen(
["strace", "-o", "myfifo", *cmd],
stdout=DEVNULL, stderr=DEVNULL)
read_strace("myfifo"),
finally:
os.unlink("myfifo")
main(sys.argv[1:])
Running both programs with time
reveals that the async program is about 15x slower:
$ time ./async_program.py find .
poetry run ./async_program.py find . 4.06s user 4.75s system 100% cpu 8.727 total
$ time ./sync_program.py find .
poetry run ./sync_program.py find . 0.27s user 0.07s system 76% cpu 0.438 total
The linked question suggests that aiofile
is known to be somehow slow, but 15x? I'm pretty sure that I still come close to the synchronous approach by using an extra thread and writing to a queue, but admittedly I didn't try it yet.
Is there a recommended way to read a file asynchronously - maybe even an approach more dedicated to named pipes as I use them in the given example?
答案1
得分: 1
异步并不是魔法。异步的优点在于当你在调用某个东西或有某个东西在调用你时,通常是远程的,因为存在网络、文件I/O等的I/O开销和延迟。
在你的情况下,由于只有一个进程读取一个文件(命名管道与否),不会有任何I/O等待。
所以,这里的所有异步操作只是为了将进程放入事件循环中并不断释放回事件循环,从而增加了开销。
英文:
So async isn't magic. What async is good at is when you are calling something or something is calling you, usually remotely, and there is I/O overhead and delays because of the network, file I/O, etc.
In your case, there won't be any I/O wait having a single process reading a single file (named pipe or not).
So ALL your async is doing here is ADDING overhead to the process to put it into an event loop and release back to the loop repeatedly.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论