英文:
Loop a AWK script with variables
问题
Here's the translated portion of your text:
我试图循环运行一个包含两个条件和来自指定列表的变量的 AWK 脚本。目的是在第一列和第三列满足两个特定条件时提取行(两列中的文本名称必须部分匹配)。
我的输入文件如下:
pop1_io 1 pop1_ei 2 1 62027313 63797977 3.047
pop1_eg 1 pop2_yu 2 1 74240214 78974955 3.827
pop3_ab 1 pop1_zx 2 1 160604473 163511425 4.04
我编写的第一个脚本如果我手动输入我需要的变量名称,就可以完美运行,但如果我尝试循环并将变量插入 awk 脚本,则无法工作。
可以运行的脚本示例:
awk '{if ($1 ~ /pop1/ && $3 ~ /pop1/)
print $1"\t" $2 "\t" $3 "\t" $4"\t" $5 "\t" $6 "\t" $7 "\t" $8}' inputfile.ibd | sed -r '/^\s*$/d' > pop1.ibd
不起作用的脚本示例:
pops="pop1 pop2 pop3"
for pop in $pops
do
awk '{if ($1 ~ /$pop/ && $3 ~ /$pop/)
print $1"\t" $2 "\t" $3 "\t" $4"\t" $5 "\t" $6 "\t" $7 "\t" $8}' inputfile.ibd | sed -r '/^\s*$/d' > out.$pop.ibd
done
第一个脚本不会打印任何内容。
我的第二次尝试是这样的:
for pop in $pops
do
awk '{if (a[$1]=~$pop && a[$3]=~$pop)
print $1"\t" $2 "\t" $3 "\t" $4"\t" $5 "\t" $6 "\t" $7 "\t" $8}' Roma_Czech.ibdne.ibd | sed -r '/^\s*$/d' > out.$pop.ibd
done
在这种情况下,它会打印第一个文件中包含的所有内容。我如何修复这个脚本?
Please note that variables and code parts have not been translated, as per your request. If you have any specific questions or need further assistance, feel free to ask.
英文:
I'm trying to loop an AWK script that contains two conditions and a variable coming from a stated list. The purpose is to extract the line when the column one and column three meet two particular conditions (the name of the text in the two columns has to partially match)
My input file is made this way:
pop1_io 1 pop1_ei 2 1 62027313 63797977 3.047
pop1_eg 1 pop2_yu 2 1 74240214 78974955 3.827
pop3_ab 1 pop1_zx 2 1 160604473 163511425 4.04
The first script I wrote works perfectly if I write manually the name of the variable I need, but it doesn't work if I try to loop it and insert variables into the awk script.
Working one:
awk '{if ($1 ~ /pop1/ && $3 ~ /pop1/)
print $1"\t" $2 "\t" $3 "\t" $4"\t" $5 "\t" $6 "\t" $7 "\t" $8}' inputfile.ibd | sed -r '/^\s*$/d' > pop1.ibd
Not working ones:
pops="pop1 pop2 pop3"
for pop in $pops
do
awk '{if ($1 ~ /$pop/ && $3 ~ /$pop/)
print $1"\t" $2 "\t" $3 "\t" $4"\t" $5 "\t" $6 "\t" $7 "\t" $8}' inputfile.ibd | sed -r '/^\s*$/d' > out.$pop.ibd
done
This first script doesn't print anything.
My second attempt is this:
for pop in $pops
do
awk '{if (a[$1]=~$pop && a[$3]=~$pop)
print $1"\t" $2 "\t" $3 "\t" $4"\t" $5 "\t" $6 "\t" $7 "\t" $8}' Roma_Czech.ibdne.ibd | sed -r '/^\s*$/d' > out.$pop.ibd
done
In this case it prints everything contained in the first file.
I could I fix this script?
答案1
得分: 4
以下是您请求的翻译部分:
- 若要在
awk
脚本中使用操作系统(例如bash
)变量,请使用-v awk_var="$bash_var"
结构。 =~
在awk
中是无效的运算符。- 您可以将输出字段分隔符定义为制表符(
OFS="\t"
),这样您就不需要在每个输出字段之间添加显式的"\t"
。 - 在这种情况下,对
a[$1]
和a[$3]
的引用毫无意义,因为数组a[]
从未创建,更不用说填充了。 - 尽管当前定义的
pops
在这种情况下有效,但您可能希望考虑使用数组。
对 OP 当前代码进行了一些更改:
pops=('pop1' 'pop2' 'pop3')
for pop in "${pops[@]}"
do
awk -v pop="$pop" 'BEGIN {OFS="\t"} ($1~pop && $3~pop) {$1=$1; print}' inputfile.ibd > "out.$pop.ibd"
done
注释:
- 假设输入文件有8个以空格分隔的字段。
$1=$1
使行被解析,以便print
可以使用新的OFS="\t"
。- 我不确定 OP 对
sed -r
的目的是什么;我将其排除在外,但 OP 可以根据需要添加回来。
这将生成:
pop1_io 1 pop1_ei 2 1 62027313 63797977 3.047
假设这个 for
循环的唯一目的是从输入文件中打印匹配的行,那么我们可以将循环结构合并到单个 awk
脚本中,例如:
poplist='pop1:pop2:pop3' # 建立以“:”分隔的字符串列表
awk -v poplist="${poplist}" '
BEGIN { OFS="\t"
n=split(poplist,pops,":") # 使用“:”分隔符拆分“poplist”变量,并将结果放入pops[]数组中
}
{ for (i=1;i<=n;i++) # 循环遍历pops[]数组的索引
if ($1~pops[i] && $3~pops[i]) {
$1=$1
print > ("out." pops[i] ".ibd")
next
}
}
' inputfile.ibd
这也将生成:
pop1_io 1 pop1_ei 2 1 62027313 63797977 3.047
希望这些翻译对您有所帮助。
英文:
A few issues with the current code:
- to use OS (eg,
bash
) variables in anawk
script use the-v awk_var="$bash_var"
construct =~
is an invalid operator inawk
- you can define the output field separator as a tab (
OFS="\t"
) so that you don't need to add an explicit"\t"
between each output field - the references to
a[$1]
anda[$3]
don't make sense in this case since the arraya[]
is never created let alone populated - while the current definition of
pops
works in this case you may want to consider using an array
Making some changes to OP's current code:
pops=('pop1' 'pop2' 'pop3')
for pop in "${pops[@]}"
do
awk -v pop="$pop" 'BEGIN {OFS="\t"} ($1~pop && $3~pop) {$1=$1; print}' inputfile.ibd > "out.$pop.ibd"
done
NOTES:
- assumes the input file has 8 space-delimited fields
- the
$1=$1
causes the line to be parsed so that theprint
can make use of the newOFS="\t"
- I'm not sure of OP's purpose of the
sed -r
; I'm leaving it out but OP can add back into the mix as needed
This generates:
pop1_io 1 pop1_ei 2 1 62027313 63797977 3.047
Assuming the only purpose of this for
loop is to print out the matching rows from the input file then we can push the looping construct down into a single awk
script, eg:
poplist='pop1:pop2:pop3' # build a list of ":" delimited strings
awk -v poplist="${poplist}" '
BEGIN { OFS="\t"
n=split(poplist,pops,":") # split the "poplist" variable on the ":" delimiter and place results in the pops[] array
}
{ for (i=1;i<=n;i++) # loop through indices of the pops[] array
if ($1~pops[i] && $3~pops[i]) {
$1=$1
print > ("out." pops[i] ".ibd")
next
}
}
' inputfile.ibd
This also generates:
pop1_io 1 pop1_ei 2 1 62027313 63797977 3.047
答案2
得分: 1
我认为这可能是您尝试使用awk执行的操作,代码如下:
{
for (i=3; i>=1; i-=2) {
key = $i
sub(/_.*/, "", key)
out = key ".ibd"
if ( !seen[key]++ ) {
printf "" > out
}
}
}
$3 ~ ("^" key) {
print > out
}
英文:
I think this might be what you're trying to do, using any awk:
$ awk '
{
for (i=3; i>=1; i-=2) {
key = $i
sub(/_.*/,"",key)
out = key ".ibd"
if ( !seen[key]++ ) {
printf "" > out
}
}
}
$3 ~ ("^" key) {
print > out
}
' file
$ head *.ibd
==> pop1.ibd <==
pop1_io 1 pop1_ei 2 1 62027313 63797977 3.047
==> pop2.ibd <==
==> pop3.ibd <==
Note that you don't need to provide a list like pop1 pop2 pop3
, the tool just creates an output file for each of those prefixes that exist in the input. If you hit a "too many open files" error message then change it to the following which will be a bit slower as it's closing the output after every write:
$ awk '
{
for (i=3; i>=1; i-=2) {
key = $i
sub(/_.*/,"",key)
out = key ".ibd"
if ( !seen[key]++ ) {
printf "" > out
}
}
}
$3 ~ ("^" key) {
print >> out
close(out)
}
' file
答案3
得分: 1
awk -F'_| *' -v list=pop1,pop2,pop3 '
BEGIN{
n=split(list,arr,",")
for(i=1; i<=n; i++) pops[arr[i]]
}
$1==$4 && $1 in pops { print $0 > ("out." $1 ".ibd")}
' file
英文:
awk -F'_| *' -v list=pop1,pop2,pop3 '
BEGIN{
n=split(list,arr,",")
for(i=1; i<=n; i++) pops[arr[i]]
}
$1==$4 && $1 in pops { print $0 > ("out." $1 ".ibd")}
' file
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论