英文:
How to fill the values to a shell script from tsv file and create new file from 1st column value
问题
我有一个包含100个条目的data.tsv文件,如下所示
ColumnA | ColumnB | ColumnC | ColumnD | ColumnE |
---|---|---|---|---|
Cell 31 | Cell 4 | Cell 3 | Cell 5 | Cell 8 |
Cell 21 | Cell 2 | Cell 5 | Cell 6 | Cell 9 |
我有一个下面的template.in脚本,需要从tsv文件中填充上述值
for_example = $ColumnA
testmyhpothesis = $ColumnB $ColumnC
cleandir = $ColumnD $ColumnE
testOutdir = /path/todir/$ColumnD
并创建一个包含ColumnA值的新脚本文件
例如:
文件1. Cell 31.in
for_example = Cell 31
testmyhpothesis = Cell 4 Cell 3
cleandir = Cell 5 Cell 8
testOutdir = /path/todir/Cell 5
文件2. Cell 21.in
for_example = Cell 21
testmyhpothesis = Cell 2 Cell 5
cleandir = Cell 6 Cell 9
testOutdir = /path/todir/Cell 6
awk -F '\t' '{for (i = 2; i <= NF; i++) {print $i >> new_file}}' data.tsv
英文:
I have a data.tsv file with 100 entries, as below
ColumnA | ColumnB | ColumnC | ColumnD | ColumnE |
---|---|---|---|---|
Cell 31 | Cell 4 | Cell 3 | Cell 5 | Cell 8 |
Cell 21 | Cell 2 | Cell 5 | Cell 6 | Cell 9 |
and I have a template.in script below, that need to fill above values from tsv file
for_example = $ColumnA
testmyhpothesis = $ColumnB $ColumnC
cleandir = $ColumnD $ColumnE
testOutdir = /path/todir/$ColumnD
And create new script file with columnA value
eg:
File 1. Cell 31.in
for_example = Cell 31
testmyhpothesis = Cell 4 Cell 3
cleandir = Cell 5 Cell 8
testOutdir = /path/todir/Cell 5
File 2. Cell 21.in
for_example = Cell 21
testmyhpothesis = Cell 2 Cell 5
cleandir = Cell 6 Cell 9
testOutdir = /path/todir/Cell 6
awk -F '\t' '{for (i = 2; i <= NF; i++) {print $i >> new_file} data.tsv
答案1
得分: 2
以下是代码的中文翻译部分:
#!/bin/bash
{
# 从TSV标题行获取变量的名称
IFS=$'\t' read -r -a varnames
export "${varnames[@]}" || exit 1
# 将每个字段读入相应的变量
while IFS=$'\t' read -r "${varnames[@]}"
do
# 在“template.in”中替换扩展
envsubst "${varnames[*]/#/$}" < template.in > "${!varnames[0]}.in"
done
} < data.tsv
请注意,代码中的注释也已经翻译成了中文。
英文:
Here's a solution that makes use of envsubst
for replacing the $ColumnXXX
in template.in
:
#!/bin/bash
{
# get the names of the variables from the TSV header
IFS=$'\t' read -r -a varnames
export "${varnames[@]}" || exit 1
# read each field into its corresponding variable
while IFS=$'\t' read -r "${varnames[@]}"
do
# replace the expansions in "template.in"
envsubst "${varnames[*]/#/$}" < template.in > "${!varnames[0]}.in"
done
} < data.tsv
答案2
得分: 1
以下是翻译好的内容:
使用任何awk:
$ cat tst.awk
BEGIN { FS="\t" }
NR == FNR {
tmplt[++numLines] = $0
next
}
FNR == 1 {
for ( fldNr=1; fldNr<=NF; fldNr++ ) {
tag = "$" $fldNr
tags2fldNrs[tag] = fldNr
}
next
}
{
out = $(tags2fldNrs["$ColumnA"]) ".in"
for ( lineNr=1; lineNr<=numLines; lineNr++ ) {
line = tmplt[lineNr]
for ( tag in tags2fldNrs ) {
if ( s = index(line,tag) ) {
fldNr = tags2fldNrs[tag]
val = $fldNr
line = substr(line,1,s-1) val substr(line,s+length(tag))
}
}
print line > out
}
close(out)
}
$ awk -f tst.awk template.in data.tsv
$ head Cell*
==> Cell 21.in <==
#!/bin/bash
for_example = Cell 21
testmyhpothesis = Cell 2 Cell 5
cleandir = Cell 6 Cell 9
testOutdir = /path/todir/Cell 6
==> Cell 31.in <==
#!/bash
for_example = Cell 31
testmyhpothesis = Cell 4 Cell 3
cleandir = Cell 5 Cell 8
testOutdir = /path/todir/Cell 5
如果列标签(名称)可能是其他列标签的子字符串,例如`ColumnA`和`ColumnAB`,那么这个代码会失败,并且它假定`ColumnA`下的值始终是唯一的。这与您提供的示例一致,如果您的示例不正确或无法弄清楚如何适应您的实际数据,请发布一个新问题。
英文:
Using any awk:
$ cat tst.awk
BEGIN { FS="\t" }
NR == FNR {
tmplt[++numLines] = $0
next
}
FNR == 1 {
for ( fldNr=1; fldNr<=NF; fldNr++ ) {
tag = "$" $fldNr
tags2fldNrs[tag] = fldNr
}
next
}
{
out = $(tags2fldNrs["$ColumnA"]) ".in"
for ( lineNr=1; lineNr<=numLines; lineNr++ ) {
line = tmplt[lineNr]
for ( tag in tags2fldNrs ) {
if ( s = index(line,tag) ) {
fldNr = tags2fldNrs[tag]
val = $fldNr
line = substr(line,1,s-1) val substr(line,s+length(tag))
}
}
print line > out
}
close(out)
}
<p>
$ awk -f tst.awk template.in data.tsv
<p>
$ head Cell*
==> Cell 21.in <==
#!/bin/bash
for_example = Cell 21
testmyhpothesis = Cell 2 Cell 5
cleandir = Cell 6 Cell 9
testOutdir = /path/todir/Cell 6
==> Cell 31.in <==
#!/bin/bash
for_example = Cell 31
testmyhpothesis = Cell 4 Cell 3
cleandir = Cell 5 Cell 8
testOutdir = /path/todir/Cell 5
That would fail if you could have column tags (names) that are substrings of others, e.g. ColumnA
and ColumnAB
, and it assumes the values under ColumnA
are always unique. That's consistent with the example you provided so post a new question if your example is wrong and you can't figure out how to adapt this to suit your real data.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论