英文:
Pass each line of stdout as stdin to a new invocation of a tool
问题
我想将stdout
的每一行传递到另一个工具的stdin
中,基本上是在Bash命令行上对行进行映射。
我知道可以使用循环,但对于这么简单的任务来说,这有点冗长,而且速度较慢:
cat out.ndjson | while read p; do \
echo "$p" | jj key
done
我知道我可以自己编写一个高性能的程序maplines
,具有这样的API:
cat ndjson | maplines jj key
其中maplines
从stdin
中读取一行,并将其传递给jj key
的新调用。
重要的是jj key
_不能_将行作为参数接受。它需要输入作为stdin
。
如果已经有这样的工具,重复发明轮子将是浪费。或者可以编写一个Bash函数。肯定已经存在类似的东西。
基本上我的问题是:如何在xargs
或parallel
中实现maplines
。我进行了相当多的研究,但xargs似乎无法将东西传递为stdout
。
cat > out.ndjson <<EOM
{"key": "line1"}
{"key": "line2"}
{"key": "line3"}
EOM
jj
可以通过 brew install tidwall/jj/jj
、二进制或源代码获得。虽然命令是什么不太重要,但测试答案是否有效很有用。
英文:
I would like to pipe each line of stdout
into stdin
of a separate invocation of another tool - essentially mapping over lines, in bash on the command line.
I know I can loop, but that's a bit verbose for such a simple task, and also slow:
cat out.ndjson | while read p; do \
echo "$p" | jj key
done
I know I could write a performant program maplines
with such an API myself:
cat ndjson | maplines jj key
where maplines
reads a line from stdin
and pipes it to a new invocation of jj key
for each line.
It's important that jj key
cannot accept the line as an argument. It needs the input as stdin
.
It would be wasteful to reinvent the wheel if there's a tool already. Or one could write a bash function. Surely something like that exists already.
Basically my question is: how do I implement maplines
in xargs
or parallel
. I researched quite a bit but xargs seems to be unable to pass things as stdout
.
cat > out.ndjson <<- EOM
{"key": "line1"}
{"key": "line2"}
{"key": "line3"}
EOM
jj
is available with brew install tidwall/jj/jj
, as binaries or from source. While it shouldn't really matter what the command is, it's useful to test whether an answer works.
答案1
得分: 2
主要原因是创建新进程在计算上是昂贵的。由于您想要为每行运行工具的新实例,一些慢速是不可避免的。但它不需要像您当前的版本那么慢,因为 echo "$p" | jj key
每行创建 两个 进程,一个用于执行 echo
,另一个用于 jj
。在bash中,您可以用here-string替换 echo
,并类似地用简单的输入重定向替换 cat |
:
while read p; do
jj key <<<"$p"
done <out.ndjson
这应该减少开销(超出 jj
工具本身所需时间的部分),意味着它将以原始速度的最多两倍运行。
英文:
The main reason it's slow is that creating new processes is computationally expensive. Since you want to run a new instance of the tool for each line, some of that slowness is inevitable. But it doesn't need to be as slow as your current version, because echo "$p" | jj key
creates two processes per line, one to do the echo
, and one for jj
. In bash, you can replace the echo
with a here-string, and similarly replace the cat |
with a simple input redirection:
while read p; do
jj key <<<"$p"
done <out.ndjson
This should cut the overhead (the time it takes above what the jj
tool itself takes) in half, meaning it'll run at up to double the speed of the original.
答案2
得分: 1
Bash中的mapfile内建命令可能可以实现您想要的功能。尝试这个Shellcheck优化的代码:
#! /bin/bash -p
function procline
{
jj key <<<"$2"
}
mapfile -t -C procline -c 1 <out.ndjson
英文:
The Bash mapfile built-in may be able to do what you want. Try this Shellcheck-clean code:
#! /bin/bash -p
function procline
{
jj key <<<"$2"
}
mapfile -t -C procline -c 1 <out.ndjson
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论