将标准输入重定向到变量数量的子进程

huangapple go评论61阅读模式
英文:

BASH: redirect stdin to variable number of subprocesses

问题

不要翻译代码部分。以下是翻译好的内容:

正如标题所述,我想将 stdin 重定向到不定数量的输出子进程。

如果我要重定向到输出文件,可以这样做:

files=(file_1 file_2 ... file_n)
tee ${files[*]} >/dev/null

但是对于子进程(具体来说,使用进程替代),像这样做事情:

programs=(">(exe_1 args_1)" ... ">(exe_n args_n)")
tee ${programs[@]} >/dev/null

将不会将 >() 解释为进程替代,而会将其解释为字面文件名(出于安全原因,我认为);而且替代项中的标志被解释为 tee 的标志。

是否可能从 stdin 中读取一行并将其重定向到所有这些进程(再次强调,进程数量是可变的:n 是未知的)?我有遗漏什么吗?

提前感谢您的帮助,对我的糟糕英语表示抱歉。

英文:

As the title says, I would like to redirect stdin to a variable number of output subprocesses.

If I had to redirect to output files I could do something like

files=(file_1 file_2 ... file_n)
tee ${files[*]} >/dev/null

but with subprocesses (using process substitutions, specifically), doing things like

programs=(">(exe_1 args_1)" ... ">(exe_n args_n)")
tee ${programs[@]} >/dev/null

will not intepret the >() as process substitutions but as literal filenames (for security reasons, I assume); also, the flags withing the substitutions are interpreted as flags of tee.

Is it possible to read ONE LINE from stdin and redirect it to all these processes (which, again, are variable in number: n is unknown)? Have I missed something, somewhere?

Thanks in advance and sorry for my bad English.

答案1

得分: 4

不要使用进程替代,而是在循环中创建一组命名管道,并使每个进程的标准输入重定向到其中一个管道。然后使用 tee 写入所有管道。

progs=(exe_1 exe_2 ...)
args=(args_1 args2 ...)
pipes=()
arraylength=${#progs[@]}

for (( i=0; i<${arraylength}; i++ ))
do
    pipe=/tmp/pipe.$$.$i
    mkfifo "$pipe" && pipes+=("$pipe") && "${progs[i]}" "${args[i]}" < "$pipe" &
done

tee "${pipes[@]}" > /dev/null
# 清理
rm -f "${pipes[@]}"

此解决方案使每个程序都以精确1个参数运行。要使其更加通用健壮,因为Bash没有二维数组,这很困难。

英文:

Instead of using process substitution, create a bunch of named pipes in a loop, and run each process with its stdin redirected to one of the pipes. Then use tee to write to all the pipes.

progs=(exe_1 exe_2 ...)
args=(args_1 args2 ...)
pipes=()
arraylenth=${#progs[@]}

for (( i=0; i&lt;${arraylength}; i++ ))
do
    pipe=/tmp/pipe.$$.$i
    mkfifo &quot;$pipe&quot; &amp;&amp; pipes+=(&quot;$pipe&quot;) &amp;&amp; &quot;$progs[i]&quot; &quot;$args[i]&quot; &lt; &quot;$pipe&quot; &amp;
done

tee &quot;${pipes[@]&quot; &gt; /dev/null
# Clean up
rm -f &quot;${pipes[@]&quot;

This solution has each program run with exactly 1 argument. It's hard to make it more general robustly because bash doesn't have 2-dimensional arrays.

答案2

得分: 2

抱歉,以下是翻译好的部分:

很不幸,这需要使用 eval。给定类似以下的内容:

all_redirections=""

add_redirection() {
  local argv_q
  printf -v argv_q '%q ' "$@"       # 生成一个 eval 安全的转义字符串"$@"
  all_redirections+=" ${argv_q} "   # 将其附加到 all_redirections 字符串
}

to_all_redirections() { eval "tee ${all_redirections}" >/dev/null; }

...您可以运行:

add_redirection exe_1 arg_1_a arg_1_b
add_redirection exe_2 arg_2_a
# ...
add_redirection exe_n arg_n

...然后,当您有一个输出需要复制到这些可执行程序的输入时:

yourprogram | to_all_redirections
英文:

Unfortunately, this is a job for eval. Given something like:

all_redirections=&quot;&quot;

add_redirection() {
  local argv_q
  printf -v argv_q &#39;%q &#39; &quot;$@&quot;       # generate an eval-safe escaping of &quot;$@&quot;
  all_redirections+=&quot; ${argv_q} &quot;   # append that to all_redirections string
}

to_all_redirections() { eval &quot;tee ${all_redirections}&quot; &gt;/dev/null; }

...you can run:

add_redirection exe_1 arg_1_a arg_1_b
add_redirection exe_2 arg_2_a
# ...
add_redirection exe_n arg_n

...and then, when you have a program whose output is to be copied to the input of those executables:

yourprogram | to_all_redirections

答案3

得分: 1

这个Shellcheck清理过的代码不使用eval,可以处理任意数量的程序和任意数量的参数:

#! /bin/bash -p

prog_args=( ::: sed -e 's/^/[1] /'
            ::: sed -e 's/^/[2] /'
            ::: sed -e 's/^/[3] /'  )

exec 3>&1

function tee_to_progs
{
    (( $# < 2 )) && return 1

    local -r startstr=$1
    shift

    local pargs=()
    while [[ $# -gt 0 && $1 != "$startstr" ]]; do
        pargs+=( "$1" )
        shift
    done

    if (( $# == 0 )); then
        # 最后一个要运行的程序。它只需读取标准输入并写入标准输出。
        "${pargs[@]}"
    else
        tee >("${pargs[@]}" >&3) | tee_to_progs "$@"
    fi
}

tee_to_progs "${prog_args[@]}"
  • prog_args 数组保存要运行的程序和参数。由于Bash不支持嵌套数组,每个命令都以标记字符串开头,以便识别单独的程序和参数。我使用字符串:::(因为它在GNU Parallel中用于类似的目的),但可以使用任何不用作程序名称或参数的字符串。代码假定数组中的第一个字符串(无论是什么)是标记字符串,因此如果字符串更改,代码无需更改。我测试过使用_代替:::
  • sed 命令只是示例。我用它们进行测试,因为每个程序的输出可以轻松识别。
  • 请注意,此代码仅支持运行简单命令。如果需要运行其他命令(例如,使用重定向的命令),您将需要使用不同的方法(可能涉及可怕的eval)。
  • exec 3>&1 使文件描述符号3与标准输出相关联。代码中使用它来确保输出到“真正的”标准输出。
  • tee_to_progs 函数运行其参数列表中的第一个程序,将输入复制给它,并使用teetee输出管道到该函数的递归调用,该函数为第二个及后续程序执行相同操作。
英文:

This Shellcheck-clean code doesn't use eval and can handle any number of programs with any number of arguments:

#! /bin/bash -p

prog_args=( ::: sed -e &#39;s/^/[1] /&#39;
            ::: sed -e &#39;s/^/[2] /&#39;
            ::: sed -e &#39;s/^/[3] /&#39;  )

exec 3&gt;&amp;1

function tee_to_progs
{
    (( $# &lt; 2 )) &amp;&amp; return 1

    local -r startstr=$1
    shift

    local pargs=()
    while [[ $# -gt 0 &amp;&amp; $1 != &quot;$startstr&quot; ]]; do
        pargs+=( &quot;$1&quot; )
        shift
    done

    if (( $# == 0 )); then
        # Last program to run.  It can just read stdin and write stdout.
        &quot;${pargs[@]}&quot;
    else
        tee &gt;(&quot;${pargs[@]}&quot; &gt;&amp;3) | tee_to_progs &quot;$@&quot;
    fi
}

tee_to_progs &quot;${prog_args[@]}&quot;
  • The prog_args array holds the programs and arguments to be run. Since Bash doesn't support nested arrays each command is preceded by a marker string to enable the separate programs and arguments to be identified. I used the string ::: (because it is used for a similar purpose in GNU Parallel), but any string that isn't used as a program name or argument could be used instead. The code assumes that the first string (whatever it happens to be) in the array is the marker string, so the code doesn't need to be changed if the string is changed. I tested using _ instead of :::.
  • The sed commands are just examples. I used them for testing because the outputs of each program can be easily identified.
  • Note that this code only supports running simple commands. If you need to run other commands (e.g. ones that use redirections) you will need to use a different approach (probably involving the dreaded eval).
  • exec 3&gt;&amp;1 causes file descriptor number 3 to be associated with standard output. It is used in the code to ensure that output goes to the "real" standard output.
  • The tee_to_progs function runs the first program in its list of arguments with input duplicated to it with tee and pipes the tee output to a recursive call of the function that does the same for the second and subsequent programs.

答案4

得分: 1

GNU Parallel提供了--tee选项来实现这个功能:

cat input | parallel --tee --pipe my_program {} ::: arg1 arg2 arg3

在内部工作原理上,它与Barmar的解决方案非常相似,但你可以使用GNU Parallel的输出控制:

  • 输出是串行化的,因此来自两个作业的输出不会混合在一起
  • 你可以保持顺序 --keep-order
  • 你可以为每一行添加标签 --tag

临时文件的清理是在作业完成之前进行的,因此如果脚本被终止,你不需要清理临时文件。

例如:

seq 10000 | parallel --tee --pipe --tag --keep-order grep {} ::: {1..9}
英文:

GNU Parallel has --tee for this:

cat input | parallel --tee --pipe my_program {} ::: arg1 arg2 arg3

Internally it works very much like Barmar's solution, but you get GNU Parallel's output control:

  • the output is serialized, so output from two jobs is not mixed
  • you can keep the order --keep-order
  • you can --tag each line

and clean up of temporary files is done before the job is done, so you do not need to clean up temporary files if the script is killed.

E.g.

seq 10000 | parallel --tee --pipe --tag --keep-order grep {} ::: {1..9}

huangapple
  • 本文由 发表于 2023年7月28日 06:00:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/76783663.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定