如何从tsv文件中填充值到shell脚本,并根据第一列的值创建新文件。

huangapple go评论99阅读模式
英文:

How to fill the values to a shell script from tsv file and create new file from 1st column value

问题

我有一个包含100个条目的data.tsv文件,如下所示

ColumnA ColumnB ColumnC ColumnD ColumnE
Cell 31 Cell 4 Cell 3 Cell 5 Cell 8
Cell 21 Cell 2 Cell 5 Cell 6 Cell 9

我有一个下面的template.in脚本,需要从tsv文件中填充上述值

  1. for_example = $ColumnA
  2. testmyhpothesis = $ColumnB $ColumnC
  3. cleandir = $ColumnD $ColumnE
  4. testOutdir = /path/todir/$ColumnD

并创建一个包含ColumnA值的新脚本文件

例如:
文件1. Cell 31.in

  1. for_example = Cell 31
  2. testmyhpothesis = Cell 4 Cell 3
  3. cleandir = Cell 5 Cell 8
  4. testOutdir = /path/todir/Cell 5

文件2. Cell 21.in

  1. for_example = Cell 21
  2. testmyhpothesis = Cell 2 Cell 5
  3. cleandir = Cell 6 Cell 9
  4. testOutdir = /path/todir/Cell 6

awk -F '\t' '{for (i = 2; i <= NF; i++) {print $i >> new_file}}' data.tsv

英文:

I have a data.tsv file with 100 entries, as below

ColumnA ColumnB ColumnC ColumnD ColumnE
Cell 31 Cell 4 Cell 3 Cell 5 Cell 8
Cell 21 Cell 2 Cell 5 Cell 6 Cell 9

and I have a template.in script below, that need to fill above values from tsv file

  1. for_example = $ColumnA
  2. testmyhpothesis = $ColumnB $ColumnC
  3. cleandir = $ColumnD $ColumnE
  4. testOutdir = /path/todir/$ColumnD

And create new script file with columnA value

eg:
File 1. Cell 31.in

  1. for_example = Cell 31
  2. testmyhpothesis = Cell 4 Cell 3
  3. cleandir = Cell 5 Cell 8
  4. testOutdir = /path/todir/Cell 5

File 2. Cell 21.in

  1. for_example = Cell 21
  2. testmyhpothesis = Cell 2 Cell 5
  3. cleandir = Cell 6 Cell 9
  4. testOutdir = /path/todir/Cell 6
  1. awk -F &#39;\t&#39; &#39;{for (i = 2; i &lt;= NF; i++) {print $i &gt;&gt; new_file} data.tsv

答案1

得分: 2

以下是代码的中文翻译部分:

  1. #!/bin/bash
  2. {
  3. # 从TSV标题行获取变量的名称
  4. IFS=$'\t' read -r -a varnames
  5. export "${varnames[@]}" || exit 1
  6. # 将每个字段读入相应的变量
  7. while IFS=$'\t' read -r "${varnames[@]}"
  8. do
  9. # 在“template.in”中替换扩展
  10. envsubst "${varnames[*]/#/$}" < template.in > "${!varnames[0]}.in"
  11. done
  12. } < data.tsv

请注意,代码中的注释也已经翻译成了中文。

英文:

Here's a solution that makes use of envsubst for replacing the $ColumnXXX in template.in:

  1. #!/bin/bash
  2. {
  3. # get the names of the variables from the TSV header
  4. IFS=$&#39;\t&#39; read -r -a varnames
  5. export &quot;${varnames[@]}&quot; || exit 1
  6. # read each field into its corresponding variable
  7. while IFS=$&#39;\t&#39; read -r &quot;${varnames[@]}&quot;
  8. do
  9. # replace the expansions in &quot;template.in&quot;
  10. envsubst &quot;${varnames[*]/#/$}&quot; &lt; template.in &gt; &quot;${!varnames[0]}.in&quot;
  11. done
  12. } &lt; data.tsv

答案2

得分: 1

以下是翻译好的内容:

  1. 使用任何awk
  2. $ cat tst.awk
  3. BEGIN { FS="\t" }
  4. NR == FNR {
  5. tmplt[++numLines] = $0
  6. next
  7. }
  8. FNR == 1 {
  9. for ( fldNr=1; fldNr<=NF; fldNr++ ) {
  10. tag = "$" $fldNr
  11. tags2fldNrs[tag] = fldNr
  12. }
  13. next
  14. }
  15. {
  16. out = $(tags2fldNrs["$ColumnA"]) ".in"
  17. for ( lineNr=1; lineNr<=numLines; lineNr++ ) {
  18. line = tmplt[lineNr]
  19. for ( tag in tags2fldNrs ) {
  20. if ( s = index(line,tag) ) {
  21. fldNr = tags2fldNrs[tag]
  22. val = $fldNr
  23. line = substr(line,1,s-1) val substr(line,s+length(tag))
  24. }
  25. }
  26. print line > out
  27. }
  28. close(out)
  29. }
  1. $ awk -f tst.awk template.in data.tsv
  2. $ head Cell*
  3. ==> Cell 21.in <==
  4. #!/bin/bash
  5. for_example = Cell 21
  6. testmyhpothesis = Cell 2 Cell 5
  7. cleandir = Cell 6 Cell 9
  8. testOutdir = /path/todir/Cell 6
  9. ==> Cell 31.in <==
  10. #!/bash
  11. for_example = Cell 31
  12. testmyhpothesis = Cell 4 Cell 3
  13. cleandir = Cell 5 Cell 8
  14. testOutdir = /path/todir/Cell 5
  15. 如果列标签(名称)可能是其他列标签的子字符串,例如`ColumnA``ColumnAB`,那么这个代码会失败,并且它假定`ColumnA`下的值始终是唯一的。这与您提供的示例一致,如果您的示例不正确或无法弄清楚如何适应您的实际数据,请发布一个新问题。
英文:

Using any awk:

  1. $ cat tst.awk
  2. BEGIN { FS=&quot;\t&quot; }
  3. NR == FNR {
  4. tmplt[++numLines] = $0
  5. next
  6. }
  7. FNR == 1 {
  8. for ( fldNr=1; fldNr&lt;=NF; fldNr++ ) {
  9. tag = &quot;$&quot; $fldNr
  10. tags2fldNrs[tag] = fldNr
  11. }
  12. next
  13. }
  14. {
  15. out = $(tags2fldNrs[&quot;$ColumnA&quot;]) &quot;.in&quot;
  16. for ( lineNr=1; lineNr&lt;=numLines; lineNr++ ) {
  17. line = tmplt[lineNr]
  18. for ( tag in tags2fldNrs ) {
  19. if ( s = index(line,tag) ) {
  20. fldNr = tags2fldNrs[tag]
  21. val = $fldNr
  22. line = substr(line,1,s-1) val substr(line,s+length(tag))
  23. }
  24. }
  25. print line &gt; out
  26. }
  27. close(out)
  28. }

<p>

  1. $ awk -f tst.awk template.in data.tsv

<p>

  1. $ head Cell*
  2. ==&gt; Cell 21.in &lt;==
  3. #!/bin/bash
  4. for_example = Cell 21
  5. testmyhpothesis = Cell 2 Cell 5
  6. cleandir = Cell 6 Cell 9
  7. testOutdir = /path/todir/Cell 6
  8. ==&gt; Cell 31.in &lt;==
  9. #!/bin/bash
  10. for_example = Cell 31
  11. testmyhpothesis = Cell 4 Cell 3
  12. cleandir = Cell 5 Cell 8
  13. testOutdir = /path/todir/Cell 5

That would fail if you could have column tags (names) that are substrings of others, e.g. ColumnA and ColumnAB, and it assumes the values under ColumnA are always unique. That's consistent with the example you provided so post a new question if your example is wrong and you can't figure out how to adapt this to suit your real data.

huangapple
  • 本文由 发表于 2023年5月25日 19:27:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/76331769.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定