2023年2月10日 03:59:10go评论90阅读模式

英文:

AWK: print ALL rows with MAX value in one field Per the other field including Identical Rows with Max value AND multiple columns

问题

I appreciate that I was touched with many solutions promptly from many contributors!!! (https://stackoverflow.com/questions/75392860/awk-print-all-rows-with-max-value-in-one-field-per-the-other-field-including-id/75401868#75401868)

This question include data with one more column and I'd like to keep the rows with highest value in column 2 per column 1 including identical rows with max value from the data containing multiple columns, and print all columns.

Data

a	130	data1
a	55	data2
a	66	data3
b	88	data4
b	99	data5
b	99	data6
c	110	data7
c	130	data8
c	130	data9

Desired output

a	130	data1
b	99	data5
b	99	data6
c	130	data8
c	130	data9

Code from @jared_mamrot works perfectly and print out all columns.

Code @Andre Wildberg provided also works perfectly and print out all columns.

       arr[$1] == $2{n[$1,arr[$1]]++; line[$1,arr[$1],n[$1,arr[$1]]] = $0}
       END{for(i in arr){
             j=0; do{j++; print line[i,arr[i],j]} while(j < n[i,arr[i]])}}' file

The awk script below by @Ed Morton also works perfectly for my previous data with 2 columns. It prints two columns; key and val.

My further question is when I have multiple columns in data, how should I modify this script to print all columns.

    { cnt[$1,$2]++; max[$1]=$2 }
    END { for (key in max) { val=max[key]; for (i=1; i<=cnt[key,val]; i++) print key, val } }
'

Thank you all for great helps!!!

英文:

Data

a	130	data1
a	55	data2
a	66	data3
b	88	data4
b	99	data5
b	99	data6
c	110	data7
c	130	data8
c	130	data9

Desired output

a	130	data1
b	99	data5
b	99	data6
c	130	data8
c	130	data9

Code from @jared_mamrot works perfectly and print out all columns.

awk &#39;NR==FNR{if($2 &gt; max[$1]){max[$1]=$2}; next} max[$1] == $2&#39; file file

Code @Andre Wildberg provided also works perfectly and print out all columns.

awk &#39;arr[$1] &lt; $2{arr[$1] = $2}
       arr[$1] == $2{n[$1,arr[$1]]++; line[$1,arr[$1],n[$1,arr[$1]]] = $0}
       END{for(i in arr){
             j=0; do{j++; print line[i,arr[i],j]} while(j &lt; n[i,arr[i]])}}&#39; file

The awk script below by @Ed Morton also works perfectly for my previous data with 2 columns. It prints two columns; key and val.

My further question is when I have multiple columns in data, how should I modify this script to print all columns.

sort file | awk &#39;
    { cnt[$1,$2]++; max[$1]=$2 }
    END { for (key in max) { val=max[key]; for (i=1; i&lt;=cnt[key,val]; i++) print key, val } }
&#39;

Thank you all for great helps!!!

答案1

得分: 4

使用任何awk和sort命令：

$ sort -k1,1 -k2,2nr file | awk '!seen[$1]++{max=$2} $2==max'
a   130 data1
b   99  data5
b   99  data6
c   130 data8
c   130 data9

或者：

$ sort -k1,1 -k2,2nr file | awk '$1!=prev{prev=$1; max=$2} $2==max'
a   130 data1
b   99  data5
b   99  data6
c   130 data8
c   130 data9

在实际运行之前的原始脚本：

$ sort -k1,1 -k2,2nr file | awk '!seen[$1]++{key=$1; max=$2} $1==key && $2==max'
a   130 data1
b   99  data5
b   99  data6
c   130 data8
c   130 data9

seen[$1]++的值在输入中首次出现给定的$1值时为0，并在再次出现相同的$1时递增为非零值。因此，!seen[$1]++的值在条件上下文中为1（即true），第一次看到给定的$1时，之后为0（false）。因此，当第一次出现a作为$1时，我们将key设置为a，max设置为$2的值，即在这种情况下为130。这就是!seen["a"]++的作用。

然后，从那时开始，我们只打印$1为a且$2为130的每一行，这在这种情况下只是输入的第一行。

然后，当b首次出现为$1时，发生相同的情况。

英文:

Using any awk and sort:

$ sort -k1,1 -k2,2nr file | awk &#39;!seen[$1]++{max=$2} $2==max&#39;
a   130 data1
b   99  data5
b   99  data6
c   130 data8
c   130 data9

or:

$ sort -k1,1 -k2,2nr file | awk &#39;$1!=prev{prev=$1; max=$2} $2==max&#39;
a   130 data1
b   99  data5
b   99  data6
c   130 data8
c   130 data9

original script before realising I'd over-thought it:

$ sort -k1,1 -k2,2nr file | awk &#39;!seen[$1]++{key=$1; max=$2} $1==key &amp;&amp; $2==max&#39;
a   130 data1
b   99  data5
b   99  data6
c   130 data8
c   130 data9

The value of seen[$1]++ is 0 the first time any given value of $1 appears in the input, and some incremental non-zero number when that same $1 appears again. So, the value of !seen[$1]++ is 1 (i.e. true in a conditional context) the first time a given $ is seen in the input, and 0 (false) afterwards. So, the first time a appears as $1 we set key to a and max to whatever value $2 has, i.e. 130 in this case. That's it for the involvement of !seen["a"]++.

From then on we just print every line for which $1 is a and $2 is 130, which in this case is just the first line of input.

Then the same happens when b is first seen as $1.

答案2

得分: 3

只需添加一个关联数组，将第3列作为值，前两列作为键，并在cnt变量中计算一个运行计数器：

awk '{
   map[$1,$2,++cnt[$1,$2]] = $0
   max[$1] = ($2 > max[$1] ? $2 : max[$1])
}
END {
   for (key in max) {
      val = max[key]
      for (i=1; i<=cnt[key,val]; i++)
         print map[key,val,i]
   }
}' 文件名
a   130 data1
b   99  data5
b   99  data6
c   130 data8
c   130 data9

这个awk解决方案不需要对文件进行排序。

英文:

You just need one additional associative array to store 3rd column as value and key as first 2 columns and a running counter being computer in cnt variable:

awk &#39;{
   map[$1,$2,++cnt[$1,$2]] = $0
   max[$1] = ($2 &gt; max[$1] ? $2 : max[$1])
}
END {
   for (key in max) {
      val = max[key]
      for (i=1; i&lt;=cnt[key,val]; i++)
         print map[key,val,i]
   }
}&#39; file
a   130 data1
b   99  data5
b   99  data6
c   130 data8
c   130 data9

There is no need to sort the file for this awk solution.

答案3

得分: 2

假设一行可能有超过3个字段：

$ cat file
a   130 data1
a   55  data2
a   66  data3
b   88  data4
b   99  data5
b   99  data6
c   110 data7
c   130 data8
c   130 data9 data10 data11

修改当前的 awk 代码的一个想法：

awk '
    { key=$1; val=$2                    # 保存前两个字段
      $1=$2=""                          # 清除前两个字段
      gsub(/^[[:space:]]+/,"")          # 移除行首空格
      ++cnt[key,val]
      max[key]=(val > max[key] ? val : max[key])
      row[key,val,cnt[key,val]]=$0      # 保存剩余的行
    }
END { for (key in max) { 
          val=max[key]
          for (i=1; i<=cnt[key,val]; i++) 
              print key, val, row[key,val,i]
      }
    }
'  file

生成的结果是：

a 66 data3
b 99 data5
b 99 data6
c 130 data8
c 130 data9 data10 data11

英文:

Assuming there may be more than 3 fields to a row:

$ cat file
a   130 data1
a   55  data2
a   66  data3
b   88  data4
b   99  data5
b   99  data6
c   110 data7
c   130 data8
c   130 data9 data10 data11

One idea for modifying the current awk code:

awk &#39;
    { key=$1; val=$2                    # save 1st two fields
      $1=$2=&quot;&quot;                          # clear 1st two fields
      gsub(/^[[:space:]]+/,&quot;&quot;)          # remove leading white space from line
      ++cnt[key,val]
      max[key]=(val &gt; max[key] ? val : max[key])
      row[key,val,cnt[key,val]]=$0      # save rest of line
    }
END { for (key in max) { 
          val=max[key]
          for (i=1; i&lt;=cnt[key,val]; i++) 
              print key, val, row[key,val,i]
      }
    }
&#39;  file

This generates:

a 66 data3
b 99 data5
b 99 data6
c 130 data8
c 130 data9 data10 data11

答案4

得分: 2

awk '
    $1 != firstcol{ firstcol=$1; max=$2; map[NR]=$0 } 
    $1 == firstcol{
        if($2>max){ map[NR--]=$0; max=$2 }
        if($2==max) map[NR]=$0
    }
    END{
        for(i in map) print map[i]
    }
' inputfile
a   130 data1
b   99  data5
b   99  data6
c   130 data8
c   130 data9

英文:

awk &#39;
    $1 != firstcol{ firstcol=$1; max=$2; map[NR]=$0 } 
    $1 == firstcol{
        if($2&gt;max){ map[NR--]=$0; max=$2 }
        if($2==max) map[NR]=$0
    }
    END{
        for(i in map) print map[i]
    }
&#39; inputfile
a   130 data1
b   99  data5
b   99  data6
c   130 data8
c   130 data9

答案5

得分: 1

相同的 Ruby 代码只需进行细微调整：

ruby -e '
grps=$<.read.split(/\R/).
	group_by{|line| line[/^\S+/]}
# {"a"=>["a   130 data1", "a   55  data2", "a   66  data3"], "b"=>["b   88  data4", "b   99  data5", "b   99  data6"], "c"=>["c   110 data7", "c   130 data8", "c   130 data9"]}
maxes=grps.map{|k,v| v.max_by{|s| s.split[1].to_f}}.map{|s| s.split[0..1] }
# [["a", "130"], ["b", "99"], ["c", "130"]}
grps.values.flatten.each{|s| puts s if maxes.include?(s.split[0..1])}
' file

输出结果：

a   130 data1
b   99  data5
b   99  data6
c   130 data8
c   130 data9

一旦你需要处理三列或更多列的数据，使用 Ruby（或 Perl、Python 等）更容易，因为它支持切片、分组和连接数组。

英文:

The same ruby works with minor adjustments:

ruby -e &#39;
grps=$&lt;.read.split(/\R/).
	group_by{|line| line[/^\S+/]}
# {&quot;a&quot;=&gt;[&quot;a   130 data1&quot;, &quot;a   55  data2&quot;, &quot;a   66  data3&quot;], &quot;b&quot;=&gt;[&quot;b   88  data4&quot;, &quot;b   99  data5&quot;, &quot;b   99  data6&quot;], &quot;c&quot;=&gt;[&quot;c   110 data7&quot;, &quot;c   130 data8&quot;, &quot;c   130 data9&quot;]}
maxes=grps.map{|k,v| v.max_by{|s| s.split[1].to_f}}.map{|s| s.split[0..1] }
# [[&quot;a&quot;, &quot;130&quot;], [&quot;b&quot;, &quot;99&quot;], [&quot;c&quot;, &quot;130&quot;]]
grps.values.flatten.each{|s| puts s if maxes.include?(s.split[0..1])}
&#39; file

Prints:

a   130 data1
b   99  data5
b   99  data6
c   130 data8
c   130 data9

Once you start getting into 3 or more columns to manage, it is easier to use ruby (or Perl, Python, etc) because of the support for slicing, grouping and joining arrays.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

AWK: print ALL rows with MAX value in one field Per the other field including Identical Rows with Max value AND multiple columns

问题

答案1

答案2

答案3

答案4

答案5

如何在BigQuery中创建一个全为1的数组？

使用awk比较两个文件的差异，当某一列匹配时

在for-each循环之外声明int i会导致错误吗？

可以创建自定义的Java数组，可以按类型和大小创建，如array[size]吗？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。