将标准输出的每一行作为新工具的标准输入。

huangapple go评论68阅读模式
英文:

Pass each line of stdout as stdin to a new invocation of a tool

问题

我想将stdout的每一行传递到另一个工具的stdin中,基本上是在Bash命令行上对行进行映射。

我知道可以使用循环,但对于这么简单的任务来说,这有点冗长,而且速度较慢:

cat out.ndjson | while read p; do \
echo "$p" | jj key
done

我知道我可以自己编写一个高性能的程序maplines,具有这样的API:

cat ndjson | maplines jj key

其中maplinesstdin中读取一行,并将其传递给jj key的新调用。

重要的是jj key_不能_将行作为参数接受。它需要输入作为stdin

如果已经有这样的工具,重复发明轮子将是浪费。或者可以编写一个Bash函数。肯定已经存在类似的东西。

基本上我的问题是:如何在xargsparallel中实现maplines。我进行了相当多的研究,但xargs似乎无法将东西传递为stdout

cat > out.ndjson <<EOM
{"key": "line1"}
{"key": "line2"}
{"key": "line3"}
EOM

jj 可以通过 brew install tidwall/jj/jj二进制或源代码获得。虽然命令是什么不太重要,但测试答案是否有效很有用。

英文:

I would like to pipe each line of stdout into stdin of a separate invocation of another tool - essentially mapping over lines, in bash on the command line.

I know I can loop, but that's a bit verbose for such a simple task, and also slow:

cat out.ndjson | while read p; do \
echo &quot;$p&quot; | jj key
done

I know I could write a performant program maplines with such an API myself:

cat ndjson | maplines jj key

where maplines reads a line from stdin and pipes it to a new invocation of jj key for each line.

It's important that jj key cannot accept the line as an argument. It needs the input as stdin.

It would be wasteful to reinvent the wheel if there's a tool already. Or one could write a bash function. Surely something like that exists already.

Basically my question is: how do I implement maplines in xargs or parallel. I researched quite a bit but xargs seems to be unable to pass things as stdout.

cat &gt; out.ndjson &lt;&lt;- EOM
{&quot;key&quot;: &quot;line1&quot;}
{&quot;key&quot;: &quot;line2&quot;}
{&quot;key&quot;: &quot;line3&quot;}
EOM

jj is available with brew install tidwall/jj/jj, as binaries or from source. While it shouldn't really matter what the command is, it's useful to test whether an answer works.

答案1

得分: 2

主要原因是创建新进程在计算上是昂贵的。由于您想要为每行运行工具的新实例,一些慢速是不可避免的。但它不需要像您当前的版本那么慢,因为 echo "$p" | jj key 每行创建 两个 进程,一个用于执行 echo,另一个用于 jj。在bash中,您可以用here-string替换 echo,并类似地用简单的输入重定向替换 cat |

while read p; do
    jj key <<<"$p"
done <out.ndjson

这应该减少开销(超出 jj 工具本身所需时间的部分),意味着它将以原始速度的最多两倍运行。

英文:

The main reason it's slow is that creating new processes is computationally expensive. Since you want to run a new instance of the tool for each line, some of that slowness is inevitable. But it doesn't need to be as slow as your current version, because echo &quot;$p&quot; | jj key creates two processes per line, one to do the echo, and one for jj. In bash, you can replace the echo with a here-string, and similarly replace the cat | with a simple input redirection:

while read p; do
    jj key &lt;&lt;&lt;&quot;$p&quot;
done &lt;out.ndjson

This should cut the overhead (the time it takes above what the jj tool itself takes) in half, meaning it'll run at up to double the speed of the original.

答案2

得分: 1

Bash中的mapfile内建命令可能可以实现您想要的功能。尝试这个Shellcheck优化的代码:

#! /bin/bash -p

function procline
{
    jj key <<<"$2"
}

mapfile -t -C procline -c 1 <out.ndjson
英文:

The Bash mapfile built-in may be able to do what you want. Try this Shellcheck-clean code:

#! /bin/bash -p

function procline
{
    jj key &lt;&lt;&lt;&quot;$2&quot;
}

mapfile -t -C procline -c 1 &lt;out.ndjson

huangapple
  • 本文由 发表于 2023年3月10日 01:25:21
  • 转载请务必保留本文链接:https://go.coder-hub.com/75688045.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定