英文:
How to fill the values to a shell script from tsv file and create new file from 1st column value
问题
我有一个包含100个条目的data.tsv文件,如下所示
| ColumnA | ColumnB | ColumnC | ColumnD | ColumnE |
|---|---|---|---|---|
| Cell 31 | Cell 4 | Cell 3 | Cell 5 | Cell 8 |
| Cell 21 | Cell 2 | Cell 5 | Cell 6 | Cell 9 |
我有一个下面的template.in脚本,需要从tsv文件中填充上述值
for_example = $ColumnA
testmyhpothesis = $ColumnB $ColumnC
cleandir = $ColumnD $ColumnE
testOutdir = /path/todir/$ColumnD
并创建一个包含ColumnA值的新脚本文件
例如:
文件1. Cell 31.in
for_example = Cell 31
testmyhpothesis = Cell 4 Cell 3
cleandir = Cell 5 Cell 8
testOutdir = /path/todir/Cell 5
文件2. Cell 21.in
for_example = Cell 21
testmyhpothesis = Cell 2 Cell 5
cleandir = Cell 6 Cell 9
testOutdir = /path/todir/Cell 6
awk -F '\t' '{for (i = 2; i <= NF; i++) {print $i >> new_file}}' data.tsv
英文:
I have a data.tsv file with 100 entries, as below
| ColumnA | ColumnB | ColumnC | ColumnD | ColumnE |
|---|---|---|---|---|
| Cell 31 | Cell 4 | Cell 3 | Cell 5 | Cell 8 |
| Cell 21 | Cell 2 | Cell 5 | Cell 6 | Cell 9 |
and I have a template.in script below, that need to fill above values from tsv file
for_example = $ColumnA
testmyhpothesis = $ColumnB $ColumnC
cleandir = $ColumnD $ColumnE
testOutdir = /path/todir/$ColumnD
And create new script file with columnA value
eg:
File 1. Cell 31.in
for_example = Cell 31
testmyhpothesis = Cell 4 Cell 3
cleandir = Cell 5 Cell 8
testOutdir = /path/todir/Cell 5
File 2. Cell 21.in
for_example = Cell 21
testmyhpothesis = Cell 2 Cell 5
cleandir = Cell 6 Cell 9
testOutdir = /path/todir/Cell 6
awk -F '\t' '{for (i = 2; i <= NF; i++) {print $i >> new_file} data.tsv
答案1
得分: 2
以下是代码的中文翻译部分:
#!/bin/bash
{
# 从TSV标题行获取变量的名称
IFS=$'\t' read -r -a varnames
export "${varnames[@]}" || exit 1
# 将每个字段读入相应的变量
while IFS=$'\t' read -r "${varnames[@]}"
do
# 在“template.in”中替换扩展
envsubst "${varnames[*]/#/$}" < template.in > "${!varnames[0]}.in"
done
} < data.tsv
请注意,代码中的注释也已经翻译成了中文。
英文:
Here's a solution that makes use of envsubst for replacing the $ColumnXXX in template.in:
#!/bin/bash
{
# get the names of the variables from the TSV header
IFS=$'\t' read -r -a varnames
export "${varnames[@]}" || exit 1
# read each field into its corresponding variable
while IFS=$'\t' read -r "${varnames[@]}"
do
# replace the expansions in "template.in"
envsubst "${varnames[*]/#/$}" < template.in > "${!varnames[0]}.in"
done
} < data.tsv
答案2
得分: 1
以下是翻译好的内容:
使用任何awk:
$ cat tst.awk
BEGIN { FS="\t" }
NR == FNR {
tmplt[++numLines] = $0
next
}
FNR == 1 {
for ( fldNr=1; fldNr<=NF; fldNr++ ) {
tag = "$" $fldNr
tags2fldNrs[tag] = fldNr
}
next
}
{
out = $(tags2fldNrs["$ColumnA"]) ".in"
for ( lineNr=1; lineNr<=numLines; lineNr++ ) {
line = tmplt[lineNr]
for ( tag in tags2fldNrs ) {
if ( s = index(line,tag) ) {
fldNr = tags2fldNrs[tag]
val = $fldNr
line = substr(line,1,s-1) val substr(line,s+length(tag))
}
}
print line > out
}
close(out)
}
$ awk -f tst.awk template.in data.tsv
$ head Cell*
==> Cell 21.in <==
#!/bin/bash
for_example = Cell 21
testmyhpothesis = Cell 2 Cell 5
cleandir = Cell 6 Cell 9
testOutdir = /path/todir/Cell 6
==> Cell 31.in <==
#!/bash
for_example = Cell 31
testmyhpothesis = Cell 4 Cell 3
cleandir = Cell 5 Cell 8
testOutdir = /path/todir/Cell 5
如果列标签(名称)可能是其他列标签的子字符串,例如`ColumnA`和`ColumnAB`,那么这个代码会失败,并且它假定`ColumnA`下的值始终是唯一的。这与您提供的示例一致,如果您的示例不正确或无法弄清楚如何适应您的实际数据,请发布一个新问题。
英文:
Using any awk:
$ cat tst.awk
BEGIN { FS="\t" }
NR == FNR {
tmplt[++numLines] = $0
next
}
FNR == 1 {
for ( fldNr=1; fldNr<=NF; fldNr++ ) {
tag = "$" $fldNr
tags2fldNrs[tag] = fldNr
}
next
}
{
out = $(tags2fldNrs["$ColumnA"]) ".in"
for ( lineNr=1; lineNr<=numLines; lineNr++ ) {
line = tmplt[lineNr]
for ( tag in tags2fldNrs ) {
if ( s = index(line,tag) ) {
fldNr = tags2fldNrs[tag]
val = $fldNr
line = substr(line,1,s-1) val substr(line,s+length(tag))
}
}
print line > out
}
close(out)
}
<p>
$ awk -f tst.awk template.in data.tsv
<p>
$ head Cell*
==> Cell 21.in <==
#!/bin/bash
for_example = Cell 21
testmyhpothesis = Cell 2 Cell 5
cleandir = Cell 6 Cell 9
testOutdir = /path/todir/Cell 6
==> Cell 31.in <==
#!/bin/bash
for_example = Cell 31
testmyhpothesis = Cell 4 Cell 3
cleandir = Cell 5 Cell 8
testOutdir = /path/todir/Cell 5
That would fail if you could have column tags (names) that are substrings of others, e.g. ColumnA and ColumnAB, and it assumes the values under ColumnA are always unique. That's consistent with the example you provided so post a new question if your example is wrong and you can't figure out how to adapt this to suit your real data.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论