如何从tsv文件中填充值到shell脚本,并根据第一列的值创建新文件。

huangapple go评论64阅读模式
英文:

How to fill the values to a shell script from tsv file and create new file from 1st column value

问题

我有一个包含100个条目的data.tsv文件,如下所示

ColumnA ColumnB ColumnC ColumnD ColumnE
Cell 31 Cell 4 Cell 3 Cell 5 Cell 8
Cell 21 Cell 2 Cell 5 Cell 6 Cell 9

我有一个下面的template.in脚本,需要从tsv文件中填充上述值

for_example = $ColumnA
testmyhpothesis = $ColumnB $ColumnC
cleandir = $ColumnD $ColumnE
testOutdir = /path/todir/$ColumnD

并创建一个包含ColumnA值的新脚本文件

例如:
文件1. Cell 31.in

for_example = Cell 31
testmyhpothesis = Cell 4 Cell 3
cleandir = Cell 5 Cell 8
testOutdir = /path/todir/Cell 5

文件2. Cell 21.in

for_example = Cell 21
testmyhpothesis = Cell 2 Cell 5
cleandir = Cell 6 Cell 9
testOutdir = /path/todir/Cell 6

awk -F '\t' '{for (i = 2; i <= NF; i++) {print $i >> new_file}}' data.tsv

英文:

I have a data.tsv file with 100 entries, as below

ColumnA ColumnB ColumnC ColumnD ColumnE
Cell 31 Cell 4 Cell 3 Cell 5 Cell 8
Cell 21 Cell 2 Cell 5 Cell 6 Cell 9

and I have a template.in script below, that need to fill above values from tsv file

for_example = $ColumnA
testmyhpothesis = $ColumnB $ColumnC
cleandir = $ColumnD $ColumnE
testOutdir = /path/todir/$ColumnD

And create new script file with columnA value

eg:
File 1. Cell 31.in

for_example = Cell 31
testmyhpothesis = Cell 4 Cell 3
cleandir = Cell 5 Cell 8
testOutdir = /path/todir/Cell 5

File 2. Cell 21.in

for_example = Cell 21
testmyhpothesis = Cell 2 Cell 5
cleandir = Cell 6 Cell 9
testOutdir = /path/todir/Cell 6
awk -F &#39;\t&#39; &#39;{for (i = 2; i &lt;= NF; i++) {print $i &gt;&gt; new_file} data.tsv

答案1

得分: 2

以下是代码的中文翻译部分:

#!/bin/bash
{
    # 从TSV标题行获取变量的名称
    IFS=$'\t' read -r -a varnames
    export "${varnames[@]}" || exit 1

    # 将每个字段读入相应的变量
    while IFS=$'\t' read -r "${varnames[@]}"
    do
        # 在“template.in”中替换扩展
        envsubst "${varnames[*]/#/$}" < template.in > "${!varnames[0]}.in"
    done
} < data.tsv

请注意,代码中的注释也已经翻译成了中文。

英文:

Here's a solution that makes use of envsubst for replacing the $ColumnXXX in template.in:

#!/bin/bash
{
    # get the names of the variables from the TSV header
    IFS=$&#39;\t&#39; read -r -a varnames
    export &quot;${varnames[@]}&quot; || exit 1

    # read each field into its corresponding variable
    while IFS=$&#39;\t&#39; read -r &quot;${varnames[@]}&quot;
    do
        # replace the expansions in &quot;template.in&quot;
        envsubst &quot;${varnames[*]/#/$}&quot; &lt; template.in &gt; &quot;${!varnames[0]}.in&quot;
    done
} &lt; data.tsv

答案2

得分: 1

以下是翻译好的内容:

使用任何awk

$ cat tst.awk
BEGIN { FS="\t" }
NR == FNR {
    tmplt[++numLines] = $0
    next
}
FNR == 1 {
    for ( fldNr=1; fldNr<=NF; fldNr++ ) {
        tag = "$" $fldNr
        tags2fldNrs[tag] = fldNr
    }
    next
}
{
    out = $(tags2fldNrs["$ColumnA"]) ".in"
    for ( lineNr=1; lineNr<=numLines; lineNr++ ) {
        line = tmplt[lineNr]
        for ( tag in tags2fldNrs ) {
            if ( s = index(line,tag) ) {
                fldNr = tags2fldNrs[tag]
                val = $fldNr
                line = substr(line,1,s-1) val substr(line,s+length(tag))
            }
        }
        print line > out
    }
    close(out)
}
$ awk -f tst.awk template.in data.tsv

$ head Cell*
==> Cell 21.in <==
#!/bin/bash

for_example = Cell 21
testmyhpothesis = Cell 2 Cell 5
cleandir = Cell 6 Cell 9
testOutdir = /path/todir/Cell 6

==> Cell 31.in <==
#!/bash

for_example = Cell 31
testmyhpothesis = Cell 4 Cell 3
cleandir = Cell 5 Cell 8
testOutdir = /path/todir/Cell 5

如果列标签(名称)可能是其他列标签的子字符串,例如`ColumnA`和`ColumnAB`,那么这个代码会失败,并且它假定`ColumnA`下的值始终是唯一的。这与您提供的示例一致,如果您的示例不正确或无法弄清楚如何适应您的实际数据,请发布一个新问题。
英文:

Using any awk:

$ cat tst.awk
BEGIN { FS=&quot;\t&quot; }
NR == FNR {
    tmplt[++numLines] = $0
    next
}
FNR == 1 {
    for ( fldNr=1; fldNr&lt;=NF; fldNr++ ) {
        tag = &quot;$&quot; $fldNr
        tags2fldNrs[tag] = fldNr
    }
    next
}
{
    out = $(tags2fldNrs[&quot;$ColumnA&quot;]) &quot;.in&quot;
    for ( lineNr=1; lineNr&lt;=numLines; lineNr++ ) {
        line = tmplt[lineNr]
        for ( tag in tags2fldNrs ) {
            if ( s = index(line,tag) ) {
                fldNr = tags2fldNrs[tag]
                val = $fldNr
                line = substr(line,1,s-1) val substr(line,s+length(tag))
            }
        }
        print line &gt; out
    }
    close(out)
}

<p>

$ awk -f tst.awk template.in data.tsv

<p>

$ head Cell*
==&gt; Cell 21.in &lt;==
#!/bin/bash

for_example = Cell 21
testmyhpothesis = Cell 2 Cell 5
cleandir = Cell 6 Cell 9
testOutdir = /path/todir/Cell 6

==&gt; Cell 31.in &lt;==
#!/bin/bash

for_example = Cell 31
testmyhpothesis = Cell 4 Cell 3
cleandir = Cell 5 Cell 8
testOutdir = /path/todir/Cell 5

That would fail if you could have column tags (names) that are substrings of others, e.g. ColumnA and ColumnAB, and it assumes the values under ColumnA are always unique. That's consistent with the example you provided so post a new question if your example is wrong and you can't figure out how to adapt this to suit your real data.

huangapple
  • 本文由 发表于 2023年5月25日 19:27:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/76331769.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定