2023年6月21日 22:44:48go评论99阅读模式

英文:

Loop a AWK script with variables

问题

Here's the translated portion of your text:

我试图循环运行一个包含两个条件和来自指定列表的变量的 AWK 脚本。目的是在第一列和第三列满足两个特定条件时提取行（两列中的文本名称必须部分匹配）。

我的输入文件如下：

pop1_io	1	pop1_ei	2	1	62027313	63797977	3.047
pop1_eg	1	pop2_yu	2	1	74240214	78974955	3.827
pop3_ab	1	pop1_zx	2	1	160604473	163511425	4.04

我编写的第一个脚本如果我手动输入我需要的变量名称，就可以完美运行，但如果我尝试循环并将变量插入 awk 脚本，则无法工作。

可以运行的脚本示例：

awk '{if ($1 ~ /pop1/ && $3 ~ /pop1/)
    print $1"\t" $2 "\t" $3 "\t" $4"\t" $5 "\t" $6 "\t" $7 "\t" $8}' inputfile.ibd | sed -r '/^\s*$/d' > pop1.ibd

不起作用的脚本示例：

pops="pop1 pop2 pop3"
for pop in $pops
do
awk '{if ($1 ~ /$pop/ && $3 ~ /$pop/)
    print $1"\t" $2 "\t" $3 "\t" $4"\t" $5 "\t" $6 "\t" $7 "\t" $8}' inputfile.ibd | sed -r '/^\s*$/d' > out.$pop.ibd
done

第一个脚本不会打印任何内容。

我的第二次尝试是这样的：

for pop in $pops
do
awk '{if (a[$1]=~$pop && a[$3]=~$pop)
    print $1"\t" $2 "\t" $3 "\t" $4"\t" $5 "\t" $6 "\t" $7 "\t" $8}' Roma_Czech.ibdne.ibd | sed -r '/^\s*$/d' > out.$pop.ibd
done

在这种情况下，它会打印第一个文件中包含的所有内容。我如何修复这个脚本？

Please note that variables and code parts have not been translated, as per your request. If you have any specific questions or need further assistance, feel free to ask.

英文:

I'm trying to loop an AWK script that contains two conditions and a variable coming from a stated list. The purpose is to extract the line when the column one and column three meet two particular conditions (the name of the text in the two columns has to partially match)
My input file is made this way:

pop1_io	1	pop1_ei	2	1	62027313	63797977	3.047
pop1_eg	1	pop2_yu	2	1	74240214	78974955	3.827
pop3_ab	1	pop1_zx	2	1	160604473	163511425	4.04

The first script I wrote works perfectly if I write manually the name of the variable I need, but it doesn't work if I try to loop it and insert variables into the awk script.
Working one:

awk &#39;{if ($1 ~ /pop1/ &amp;&amp; $3 ~ /pop1/)
	print $1&quot;\t&quot; $2 &quot;\t&quot; $3 &quot;\t&quot; $4&quot;\t&quot; $5 &quot;\t&quot; $6 &quot;\t&quot; $7 &quot;\t&quot; $8}&#39; inputfile.ibd | sed -r &#39;/^\s*$/d&#39; &gt; pop1.ibd

Not working ones:

pops=&quot;pop1 pop2 pop3&quot;
for pop in $pops
do
awk &#39;{if ($1 ~ /$pop/ &amp;&amp; $3 ~ /$pop/)
	print $1&quot;\t&quot; $2 &quot;\t&quot; $3 &quot;\t&quot; $4&quot;\t&quot; $5 &quot;\t&quot; $6 &quot;\t&quot; $7 &quot;\t&quot; $8}&#39; inputfile.ibd | sed -r &#39;/^\s*$/d&#39; &gt; out.$pop.ibd
done

This first script doesn't print anything.
My second attempt is this:

for pop in $pops
do
awk &#39;{if (a[$1]=~$pop &amp;&amp; a[$3]=~$pop)
	print $1&quot;\t&quot; $2 &quot;\t&quot; $3 &quot;\t&quot; $4&quot;\t&quot; $5 &quot;\t&quot; $6 &quot;\t&quot; $7 &quot;\t&quot; $8}&#39; Roma_Czech.ibdne.ibd | sed -r &#39;/^\s*$/d&#39; &gt; out.$pop.ibd
done

In this case it prints everything contained in the first file.
I could I fix this script?

答案1

得分: 4

以下是您请求的翻译部分：

若要在 awk 脚本中使用操作系统（例如 bash）变量，请使用 -v awk_var="$bash_var" 结构。
=~ 在 awk 中是无效的运算符。
您可以将输出字段分隔符定义为制表符（OFS="\t"），这样您就不需要在每个输出字段之间添加显式的 "\t"。
在这种情况下，对 a[$1] 和 a[$3] 的引用毫无意义，因为数组 a[] 从未创建，更不用说填充了。
尽管当前定义的 pops 在这种情况下有效，但您可能希望考虑使用数组。

对 OP 当前代码进行了一些更改：

pops=('pop1' 'pop2' 'pop3')
for pop in "${pops[@]}"
do
    awk -v pop="$pop" 'BEGIN {OFS="\t"} ($1~pop && $3~pop) {$1=$1; print}' inputfile.ibd > "out.$pop.ibd"
done

注释：

假设输入文件有8个以空格分隔的字段。
$1=$1 使行被解析，以便 print 可以使用新的 OFS="\t"。
我不确定 OP 对 sed -r 的目的是什么；我将其排除在外，但 OP 可以根据需要添加回来。

这将生成：

pop1_io 1       pop1_ei 2       1       62027313        63797977        3.047

假设这个 for 循环的唯一目的是从输入文件中打印匹配的行，那么我们可以将循环结构合并到单个 awk 脚本中，例如：

poplist='pop1:pop2:pop3'                     # 建立以“:”分隔的字符串列表
awk -v poplist="${poplist}" '
BEGIN { OFS="\t"
        n=split(poplist,pops,":")            # 使用“:”分隔符拆分“poplist”变量，并将结果放入pops[]数组中
      } 
      { for (i=1;i<=n;i++)                   # 循环遍历pops[]数组的索引
            if ($1~pops[i] && $3~pops[i]) {
               $1=$1
               print > ("out." pops[i] ".ibd")
               next
            }
      }
' inputfile.ibd

这也将生成：

pop1_io 1       pop1_ei 2       1       62027313        63797977        3.047

希望这些翻译对您有所帮助。

英文:

A few issues with the current code:

to use OS (eg, bash) variables in an awk script use the -v awk_var="$bash_var" construct
=~ is an invalid operator in awk
you can define the output field separator as a tab (OFS="\t") so that you don't need to add an explicit "\t" between each output field
the references to a[$1] and a[$3] don't make sense in this case since the array a[] is never created let alone populated
while the current definition of pops works in this case you may want to consider using an array

Making some changes to OP's current code:

pops=(&#39;pop1&#39; &#39;pop2&#39; &#39;pop3&#39;)
for pop in &quot;${pops[@]}&quot;
do
    awk -v pop=&quot;$pop&quot; &#39;BEGIN {OFS=&quot;\t&quot;} ($1~pop &amp;&amp; $3~pop) {$1=$1; print}&#39; inputfile.ibd &gt; &quot;out.$pop.ibd&quot;
done

NOTES:

assumes the input file has 8 space-delimited fields
the $1=$1 causes the line to be parsed so that the print can make use of the new OFS="\t"
I'm not sure of OP's purpose of the sed -r; I'm leaving it out but OP can add back into the mix as needed

This generates:

pop1_io 1       pop1_ei 2       1       62027313        63797977        3.047

Assuming the only purpose of this for loop is to print out the matching rows from the input file then we can push the looping construct down into a single awk script, eg:

poplist=&#39;pop1:pop2:pop3&#39;                     # build a list of &quot;:&quot; delimited strings
awk -v poplist=&quot;${poplist}&quot; &#39;
BEGIN { OFS=&quot;\t&quot;
        n=split(poplist,pops,&quot;:&quot;)            # split the &quot;poplist&quot; variable on the &quot;:&quot; delimiter and place results in the pops[] array
      } 
      { for (i=1;i&lt;=n;i++)                   # loop through indices of the pops[] array
            if ($1~pops[i] &amp;&amp; $3~pops[i]) {
               $1=$1
               print &gt; (&quot;out.&quot; pops[i] &quot;.ibd&quot;)
               next
            }
      }
&#39; inputfile.ibd

This also generates:

pop1_io 1       pop1_ei 2       1       62027313        63797977        3.047

答案2

得分: 1

我认为这可能是您尝试使用awk执行的操作，代码如下：

{
    for (i=3; i>=1; i-=2) {
        key = $i
        sub(/_.*/, "", key)
        out = key ".ibd"
        if ( !seen[key]++ ) {
            printf "" > out
        }
    }
}
$3 ~ ("^" key) {
    print > out
}

英文:

I think this might be what you're trying to do, using any awk:

$ awk &#39;
    {
        for (i=3; i&gt;=1; i-=2) {
            key = $i
            sub(/_.*/,&quot;&quot;,key)
            out = key &quot;.ibd&quot;
            if ( !seen[key]++ ) {
                printf &quot;&quot; &gt; out
            }
        }
    }
    $3 ~ (&quot;^&quot; key) {
        print &gt; out
    }
&#39; file
$ head *.ibd
==&gt; pop1.ibd &lt;==
pop1_io 1   pop1_ei 2   1   62027313    63797977    3.047
==&gt; pop2.ibd &lt;==
==&gt; pop3.ibd &lt;==

Note that you don't need to provide a list like pop1 pop2 pop3, the tool just creates an output file for each of those prefixes that exist in the input. If you hit a "too many open files" error message then change it to the following which will be a bit slower as it's closing the output after every write:

$ awk &#39;
    {
        for (i=3; i&gt;=1; i-=2) {
            key = $i
            sub(/_.*/,&quot;&quot;,key)
            out = key &quot;.ibd&quot;
            if ( !seen[key]++ ) {
                printf &quot;&quot; &gt; out
            }
        }
    }
    $3 ~ (&quot;^&quot; key) {
        print &gt;&gt; out
        close(out)
    }
&#39; file

答案3

得分: 1

awk -F&#39;_| *&#39; -v list=pop1,pop2,pop3 &#39;
    BEGIN{
        n=split(list,arr,&quot;,&quot;)
        for(i=1; i&lt;=n; i++) pops[arr[i]] 
    }
    $1==$4 &amp;&amp; $1 in pops { print $0 &gt; (&quot;out.&quot; $1 &quot;.ibd&quot;)}
&#39; file

英文:

awk -F&#39;_| *&#39; -v list=pop1,pop2,pop3 &#39;
    BEGIN{
        n=split(list,arr,&quot;,&quot;)
        for(i=1; i&lt;=n; i++) pops[arr[i]] 
    }
    $1==$4 &amp;&amp; $1 in pops { print $0 &gt; (&quot;out.&quot; $1 &quot;.ibd&quot;)}
&#39; file

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

循环一个带有变量的AWK脚本

问题

答案1

答案2

答案3

在Go语言中，应该按行优先顺序还是列优先顺序访问2D切片呢？

Git多次使用Bash进行cherry-pick时出现错误。

Python 检查在循环时的其他行

Why all the script in my git hooks (pre-commit, post-commit, pre-receive, pre-push etc) do not run?

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。