2023年3月4日 08:58:11go评论53阅读模式

英文:

Loop through columns, then transform cells, then show unique values in awk

问题

I can provide the requested translations:

Input:

I'd like to transform some data using awk but need some help pls. I want to extract for columns starting with "sam" (where the column number is undefined) everything before the first colon.

Desired output:

This is the best I've got so far... but it doesn't work.

awk -F"\t" '{ for(i=5; i<=NF; --i); split($i,a,":"); print a[1]}}' input > output

I know how to cut a column i.e. cut -d ':' -f2 but as far as I understand you can't combine cut with awk in a loop!

Then, I want to find all the unique values for columns starting with sam in the output file e.g.

0/1
1/0
0/0
1/1

I'm afraid I'm totally lost on an awk solution for that. I can do it in R but an awk solution would be preferred and much faster.

R solution:

output %>% pivot_longer(-c(col1:col4)) -> df_long
df_long %>% select(value)
unique(df_long)

英文:

I'd like to transform some data using awk but need some help pls. I want to extract for columns starting with "sam" (where the column number is undefined) everything before the first colon.

Input:

col1	col2	col3	col4	sam1	sam2	sam3
a	b	c	d	0/1:12	1/0:9	0/1:16
e	f	g	h	0/0:7	1/1:98	0/0:8

Desired output:

col1	col2	col3	col4	sam1	sam2	sam3
a	b	c	d	0/1	1/0	0/1
e	f	g	h	0/0	1/1	0/0

This is the best I've got so far... but it doesn't work.

awk -F&quot;\t&quot; &#39;{ for(i=5; i&lt;=NF; --i); split($i,a,&quot;:&quot;); print a[1]}}&#39; input &gt; output

I know how to cut a column i.e. cut -d ':' -f2 but as far as I understand you can't combine cut with awk in a loop!

Then, I want to find all the unique values for columns starting with sam in the output file e.g.

0/1
1/0
0/0
1/1

I'm afraid I'm totally lost on an awk solution for that. I can do it in R but an awk solution would be preferred and much faster.

R solution:

output %&gt;% pivot_longer(-c(col1:col4)) -&gt; df_long
df_long %&lt;&gt;% select(value)
unique(df_long)

答案1

得分: 1

    $ awk 'NR==1 {for(i=1;i<=NF;i++) if($i~/^sam/) cols[i]} 
                 {for(i=1;i<=NF;i++) 
                    if(i in cols) 
                      {split($i,t,":"); $i=t[1]}}1' file | 
      column -t

    col1  col2  col3  col4  sam1  sam2  sam3
    a     b     c     d     0/1   1/0   0/1
    e     f     g     h     0/0   1/1   0/0

or just interested in the unique values

    $ awk 'NR==1 {for(i=1;i<=NF;i++) if($i~/^sam/) cols[i]; next} 
                 {for(i in cols) {split($i,t,":"); if(!vals[t[1]]++) print t[1]}}' file

    0/1
    1/0
    0/0
    1/1

英文:

$ awk &#39;NR==1 {for(i=1;i&lt;=NF;i++) if($i~/^sam/) cols[i]} 
             {for(i=1;i&lt;=NF;i++) 
                if(i in cols) 
                  {split($i,t,&quot;:&quot;); $i=t[1]}}1&#39; file | 
  column -t

col1  col2  col3  col4  sam1  sam2  sam3
a     b     c     d     0/1   1/0   0/1
e     f     g     h     0/0   1/1   0/0

or just interested in the unique values

$ awk &#39;NR==1 {for(i=1;i&lt;=NF;i++) if($i~/^sam/) cols[i]; next} 
             {for(i in cols) {split($i,t,&quot;:&quot;); if(!vals[t[1]]++) print t[1]}}&#39; file

0/1
1/0
0/0
1/1

答案2

得分: 1

awk '
    BEGIN { FS = OFS = "\t" }
    NR==1 { while(i++<NF) if ($i ~ /^sam/) p[i] }
    NR>1 { for (i in p) { sub(/:.*$/,"",$i); u[$i] } }
    { print > "output" }
    END { for (i in u) print i > "unique" }
' input

英文:

awk &#39;
    BEGIN { FS = OFS = &quot;\t&quot; }
    NR==1 { while(i++&lt;NF) if ($i ~ /^sam/) p[i] }
    NR&gt;1 { for (i in p) { sub(/:.*$/,&quot;&quot;,$i); u[$i] } }
    { print &gt;&quot;output&quot; }
    END { for (i in u) print i &gt;&quot;unique&quot; }
&#39; input

use first row to populate a list with columns of interest
on subsequent rows, process relevant columns and copy amended values to hash
print each line to the file called "output"
at the end print the keys of the hash to a file called "unique"

awk's arrays are hashes so storing items as keys of an array gives the unique items

答案3

得分: 1

你可以很容易一次完成所有操作。例如，只需使用 gsub() 来从每个字段中移除 :XX，然后使用一个简单的数组来收集唯一的 sam 字段：

awk -F"\t" '{gsub(/:[^[:space:]]+/,"")} FNR>1 {for (i=5; i<=NF; i++) a[$i]++} END {for (i in a) print i}1' file

示例用法/输出

使用名为 file 的内容，你将得到：

$ awk -F"\t" '{gsub(/:[^[:space:]]+/,"")} FNR>1 {for (i=5; i<=NF; i++) a[$i]++} END {for (i in a) print i}1' file
col1    col2    col3    col4    sam1    sam2    sam3
a       b       c       d       0/1     1/0     0/1
e       f       g       h       0/0     1/1     0/0
1/0
1/1
0/0
0/1

在 Awk 脚本形式中

你可以将内容放入一个简单的脚本文件中，并使用 chmod +x 使其可执行，然后只需提供要读取的文件名作为参数。例如，创建名为 sam.awk 的文件，内容如下：

#!/bin/awk -f

BEGIN { FS = "\t" }
{
  gsub(/:[^[:space:]]+/,"")
  print
}
FNR>1 {
  for (i=5; i<=NF; i++)
    a[$i]++
}
END {
  for (i in a)
    print i
}

现在只需执行 chmod +x sam.awk 并运行 ./sam.awk file 来生成结果：

$ ./sam.awk file
col1    col2    col3    col4    sam1    sam2    sam3
a       b       c       d       0/1     1/0     0/1
e       f       g       h       0/0     1/1     0/0
1/0
1/1
0/0
0/1

不管是作为一行命令还是作为脚本，都可以，完全取决于你。

英文:

You can do it all in one go fairly easily. For example simply using gsub() to remove the :XX from each field and then a simple array to collect the unique sam fields you would have:

awk -F&quot;\t&quot; &#39;{gsub(/:[^[:space:]]+/,&quot;&quot;)} FNR&gt;1 {for (i=5; i&lt;=NF; i++) a[$i]++} END {for (i in a) print i}1&#39; file

Example Use/Output

With your content in file you would have:

$ awk -F&quot;\t&quot; &#39;{gsub(/:[^[:space:]]+/,&quot;&quot;)} FNR&gt;1 {for (i=5; i&lt;=NF; i++) a[$i]++} END {for (i in a) print i}1&#39; file
col1    col2    col3    col4    sam1    sam2    sam3
a       b       c       d       0/1     1/0     0/1
e       f       g       h       0/0     1/1     0/0
1/0
1/1
0/0
0/1

In Awk Script Form

You can put the contents in a simple script file and make it executable with chmod +x and then just provide the filename to read as an argument. For example, create sam.awk as follows:

#!/bin/awk -f

BEGIN { FS = &quot;\t&quot; }
{
  gsub(/:[^[:space:]]+/,&quot;&quot;)
  print
}
FNR&gt;1 {
  for (i=5; i&lt;=NF; i++)
    a[$i]++
}
END {
  for (i in a)
    print i
}

Now simply chmod +x sam.awk and execute ./sam.awk file to produce:

$ ./sam.awk file
col1    col2    col3    col4    sam1    sam2    sam3
a       b       c       d       0/1     1/0     0/1
e       f       g       h       0/0     1/1     0/0
1/0
1/1
0/0
0/1

Either way, as a one-liner or as a script is fine -- up to you entirely.

答案4

得分: 0

"查找以sam开头的列的所有唯一值"

$ grep -o "[0-9]/[0-9]" inputfile | sort -u
0/0
0/1
1/0
1/1

英文:

"find all the unique values for columns starting with sam"

$ grep -o &quot;[0-9]/[0-9]&quot; inputfile|sort -u
0/0
0/1
1/0
1/1

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

循环遍历列，然后转换单元格，最后在awk中显示唯一值。

问题

答案1

答案2

答案3

答案4

如何使用sed、awk或perl替换/etc/skel/.bashrc或.bashrc文件中的PS1变量？

如何使用awk从具有多个条件的txt文件中提取相邻字符串？

AWK命令将第三个逗号替换为换行，或者换句话说，将数据分成三列。

运行Linux脚本，在使用awk打印输出后。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论