循环一个带有变量的AWK脚本

huangapple go评论64阅读模式
英文:

Loop a AWK script with variables

问题

Here's the translated portion of your text:

我试图循环运行一个包含两个条件和来自指定列表的变量的 AWK 脚本。目的是在第一列和第三列满足两个特定条件时提取行(两列中的文本名称必须部分匹配)。

我的输入文件如下:

pop1_io	1	pop1_ei	2	1	62027313	63797977	3.047
pop1_eg	1	pop2_yu	2	1	74240214	78974955	3.827
pop3_ab	1	pop1_zx	2	1	160604473	163511425	4.04

我编写的第一个脚本如果我手动输入我需要的变量名称,就可以完美运行,但如果我尝试循环并将变量插入 awk 脚本,则无法工作。

可以运行的脚本示例:

awk '{if ($1 ~ /pop1/ && $3 ~ /pop1/)
    print $1"\t" $2 "\t" $3 "\t" $4"\t" $5 "\t" $6 "\t" $7 "\t" $8}' inputfile.ibd | sed -r '/^\s*$/d' > pop1.ibd

不起作用的脚本示例:

pops="pop1 pop2 pop3"

for pop in $pops
do
awk '{if ($1 ~ /$pop/ && $3 ~ /$pop/)
    print $1"\t" $2 "\t" $3 "\t" $4"\t" $5 "\t" $6 "\t" $7 "\t" $8}' inputfile.ibd | sed -r '/^\s*$/d' > out.$pop.ibd
done

第一个脚本不会打印任何内容。

我的第二次尝试是这样的:

for pop in $pops
do
awk '{if (a[$1]=~$pop && a[$3]=~$pop)
    print $1"\t" $2 "\t" $3 "\t" $4"\t" $5 "\t" $6 "\t" $7 "\t" $8}' Roma_Czech.ibdne.ibd | sed -r '/^\s*$/d' > out.$pop.ibd
done

在这种情况下,它会打印第一个文件中包含的所有内容。我如何修复这个脚本?

Please note that variables and code parts have not been translated, as per your request. If you have any specific questions or need further assistance, feel free to ask.

英文:

I'm trying to loop an AWK script that contains two conditions and a variable coming from a stated list. The purpose is to extract the line when the column one and column three meet two particular conditions (the name of the text in the two columns has to partially match)
My input file is made this way:

pop1_io	1	pop1_ei	2	1	62027313	63797977	3.047
pop1_eg	1	pop2_yu	2	1	74240214	78974955	3.827
pop3_ab	1	pop1_zx	2	1	160604473	163511425	4.04

The first script I wrote works perfectly if I write manually the name of the variable I need, but it doesn't work if I try to loop it and insert variables into the awk script.
Working one:

awk '{if ($1 ~ /pop1/ && $3 ~ /pop1/)
	print $1"\t" $2 "\t" $3 "\t" $4"\t" $5 "\t" $6 "\t" $7 "\t" $8}' inputfile.ibd | sed -r '/^\s*$/d' > pop1.ibd

Not working ones:

pops="pop1 pop2 pop3"

for pop in $pops
do
awk '{if ($1 ~ /$pop/ && $3 ~ /$pop/)
	print $1"\t" $2 "\t" $3 "\t" $4"\t" $5 "\t" $6 "\t" $7 "\t" $8}' inputfile.ibd | sed -r '/^\s*$/d' > out.$pop.ibd
done

This first script doesn't print anything.
My second attempt is this:

for pop in $pops
do
awk '{if (a[$1]=~$pop && a[$3]=~$pop)
	print $1"\t" $2 "\t" $3 "\t" $4"\t" $5 "\t" $6 "\t" $7 "\t" $8}' Roma_Czech.ibdne.ibd | sed -r '/^\s*$/d' > out.$pop.ibd
done

In this case it prints everything contained in the first file.
I could I fix this script?

答案1

得分: 4

以下是您请求的翻译部分:

  • 若要在 awk 脚本中使用操作系统(例如 bash)变量,请使用 -v awk_var="$bash_var" 结构。
  • =~awk 中是无效的运算符。
  • 您可以将输出字段分隔符定义为制表符(OFS="\t"),这样您就不需要在每个输出字段之间添加显式的 "\t"
  • 在这种情况下,对 a[$1]a[$3] 的引用毫无意义,因为数组 a[] 从未创建,更不用说填充了。
  • 尽管当前定义的 pops 在这种情况下有效,但您可能希望考虑使用数组。

对 OP 当前代码进行了一些更改:

pops=('pop1' 'pop2' 'pop3')

for pop in "${pops[@]}"
do
    awk -v pop="$pop" 'BEGIN {OFS="\t"} ($1~pop && $3~pop) {$1=$1; print}' inputfile.ibd > "out.$pop.ibd"
done

注释:

  • 假设输入文件有8个以空格分隔的字段。
  • $1=$1 使行被解析,以便 print 可以使用新的 OFS="\t"
  • 我不确定 OP 对 sed -r 的目的是什么;我将其排除在外,但 OP 可以根据需要添加回来。

这将生成:

pop1_io 1       pop1_ei 2       1       62027313        63797977        3.047

假设这个 for 循环的唯一目的是从输入文件中打印匹配的行,那么我们可以将循环结构合并到单个 awk 脚本中,例如:

poplist='pop1:pop2:pop3'                     # 建立以“:”分隔的字符串列表

awk -v poplist="${poplist}" '
BEGIN { OFS="\t"
        n=split(poplist,pops,":")            # 使用“:”分隔符拆分“poplist”变量,并将结果放入pops[]数组中
      } 
      { for (i=1;i<=n;i++)                   # 循环遍历pops[]数组的索引
            if ($1~pops[i] && $3~pops[i]) {
               $1=$1
               print > ("out." pops[i] ".ibd")
               next
            }
      }
' inputfile.ibd

这也将生成:

pop1_io 1       pop1_ei 2       1       62027313        63797977        3.047

希望这些翻译对您有所帮助。

英文:

A few issues with the current code:

  • to use OS (eg, bash) variables in an awk script use the -v awk_var=&quot;$bash_var&quot; construct
  • =~ is an invalid operator in awk
  • you can define the output field separator as a tab (OFS=&quot;\t&quot;) so that you don't need to add an explicit &quot;\t&quot; between each output field
  • the references to a[$1] and a[$3] don't make sense in this case since the array a[] is never created let alone populated
  • while the current definition of pops works in this case you may want to consider using an array

Making some changes to OP's current code:

pops=(&#39;pop1&#39; &#39;pop2&#39; &#39;pop3&#39;)

for pop in &quot;${pops[@]}&quot;
do
    awk -v pop=&quot;$pop&quot; &#39;BEGIN {OFS=&quot;\t&quot;} ($1~pop &amp;&amp; $3~pop) {$1=$1; print}&#39; inputfile.ibd &gt; &quot;out.$pop.ibd&quot;
done

NOTES:

  • assumes the input file has 8 space-delimited fields
  • the $1=$1 causes the line to be parsed so that the print can make use of the new OFS=&quot;\t&quot;
  • I'm not sure of OP's purpose of the sed -r; I'm leaving it out but OP can add back into the mix as needed

This generates:

pop1_io 1       pop1_ei 2       1       62027313        63797977        3.047

Assuming the only purpose of this for loop is to print out the matching rows from the input file then we can push the looping construct down into a single awk script, eg:

poplist=&#39;pop1:pop2:pop3&#39;                     # build a list of &quot;:&quot; delimited strings

awk -v poplist=&quot;${poplist}&quot; &#39;
BEGIN { OFS=&quot;\t&quot;
        n=split(poplist,pops,&quot;:&quot;)            # split the &quot;poplist&quot; variable on the &quot;:&quot; delimiter and place results in the pops[] array
      } 
      { for (i=1;i&lt;=n;i++)                   # loop through indices of the pops[] array
            if ($1~pops[i] &amp;&amp; $3~pops[i]) {
               $1=$1
               print &gt; (&quot;out.&quot; pops[i] &quot;.ibd&quot;)
               next
            }
      }
&#39; inputfile.ibd

This also generates:

pop1_io 1       pop1_ei 2       1       62027313        63797977        3.047

答案2

得分: 1

我认为这可能是您尝试使用awk执行的操作,代码如下:

{
    for (i=3; i>=1; i-=2) {
        key = $i
        sub(/_.*/, "", key)
        out = key ".ibd"
        if ( !seen[key]++ ) {
            printf "" > out
        }
    }
}
$3 ~ ("^" key) {
    print > out
}
英文:

I think this might be what you're trying to do, using any awk:

$ awk &#39;
    {
        for (i=3; i&gt;=1; i-=2) {
            key = $i
            sub(/_.*/,&quot;&quot;,key)
            out = key &quot;.ibd&quot;
            if ( !seen[key]++ ) {
                printf &quot;&quot; &gt; out
            }
        }
    }
    $3 ~ (&quot;^&quot; key) {
        print &gt; out
    }
&#39; file

$ head *.ibd
==&gt; pop1.ibd &lt;==
pop1_io 1   pop1_ei 2   1   62027313    63797977    3.047

==&gt; pop2.ibd &lt;==

==&gt; pop3.ibd &lt;==

Note that you don't need to provide a list like pop1 pop2 pop3, the tool just creates an output file for each of those prefixes that exist in the input. If you hit a "too many open files" error message then change it to the following which will be a bit slower as it's closing the output after every write:

$ awk &#39;
    {
        for (i=3; i&gt;=1; i-=2) {
            key = $i
            sub(/_.*/,&quot;&quot;,key)
            out = key &quot;.ibd&quot;
            if ( !seen[key]++ ) {
                printf &quot;&quot; &gt; out
            }
        }
    }
    $3 ~ (&quot;^&quot; key) {
        print &gt;&gt; out
        close(out)
    }
&#39; file

答案3

得分: 1

awk -F&#39;_| *&#39; -v list=pop1,pop2,pop3 &#39;
    BEGIN{
        n=split(list,arr,&quot;,&quot;)
        for(i=1; i&lt;=n; i++) pops[arr[i]] 
    }
    $1==$4 &amp;&amp; $1 in pops { print $0 &gt; (&quot;out.&quot; $1 &quot;.ibd&quot;)}
&#39; file
英文:
awk -F&#39;_| *&#39; -v list=pop1,pop2,pop3 &#39;
    BEGIN{
        n=split(list,arr,&quot;,&quot;)
        for(i=1; i&lt;=n; i++) pops[arr[i]] 
    }
    $1==$4 &amp;&amp; $1 in pops { print $0 &gt; (&quot;out.&quot; $1 &quot;.ibd&quot;)}
&#39; file

huangapple
  • 本文由 发表于 2023年6月21日 22:44:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/76524559.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定