2023年1月9日 11:05:47go评论135阅读模式

英文:

Countif like function in AWK with field headers

问题

我正在寻找一种方法来计算 CSV 文件中字段范围内值出现的次数，类似于 Excel 中的 COUNTIF，尽管我希望如果可能的话使用 awk 命令。

因此，第6列应该有值的范围，第7列应该有该值在第7列中出现的次数，如下所示：

awk -F, '{print $0}' file3
f1,f2,f3,f4,f5,test
row1_1,row1_2,row1_3,SBCDE,row1_5,SBCD
row2_1,row2_2,row2_3,AWERF,row2_5,AWER
row3_1,row3_2,row3_3,ASDFG,row3_5,ASDF
row4_1,row4_2,row4_3,PRE-ASDQG,row4_5,ASDQ
row4_1,row4_2,row4_3,PRE-ASDQF,row4_5,ASDQ

awk -F, '{print $6}' file3
test
SBCD
AWER
ASDF
ASDQ
ASDQ

我想要的是：

f1,f2,f3,f4,f5,test,count
row1_1,row1_2,row1_3,SBCDE,row1_5,SBCD,1
row2_1,row2_2,row2_3,AWERF,row2_5,AWER,1
row3_1,row3_2,row3_3,ASDFG,row3_5,ASDF,1
row4_1,row4_2,row4_3,PRE-ASDQG,row4_5,ASDQ,2
row4_1,row4_2,row4_3,PRE-ASDQF,row4_5,ASDQ,2

要实现这个目标，您可以使用以下 awk 命令：

awk -F, -v OFS=, 'NR==1{ print $0, "count" }
NR>1{ count[$6]++ }
{ print $0, count[$6] }' file3

这将在第一行添加一个名为 "count" 的新字段，并在每行的最后一列显示相应的计数值。

英文:

I am looking for a way of counting the number of times a value in a field appears in a range of fields in a csv file much the same as countif in excel although I would like to use an awk command if possible.

So column 6 should have the range of values and column 7 would have the times the value appears in column 7, as per below

&gt;awk -F, &#39;{print $0}&#39; file3
f1,f2,f3,f4,f5,test
row1_1,row1_2,row1_3,SBCDE,row1_5,SBCD
row2_1,row2_2,row2_3,AWERF,row2_5,AWER
row3_1,row3_2,row3_3,ASDFG,row3_5,ASDF
row4_1,row4_2,row4_3,PRE-ASDQG,row4_5,ASDQ
row4_1,row4_2,row4_3,PRE-ASDQF,row4_5,ASDQ

&gt;awk -F, &#39;{print $6}&#39; file3
test
SBCD
AWER
ASDF
ASDQ
ASDQ

What i want is:

f1,f2,f3,f4,f5,test,count
row1_1,row1_2,row1_3,SBCDE,row1_5,SBCD,1
row2_1,row2_2,row2_3,AWERF,row2_5,AWER,1
row3_1,row3_2,row3_3,ASDFG,row3_5,ASDF,1
row4_1,row4_2,row4_3,PRE-ASDQG,row4_5,ASDQ,2
row4_1,row4_2,row4_3,PRE-ASDQF,row4_5,ASDQ,2

#adds field name count that I want:

 awk -F, -v OFS=, &#39;NR==1{ print $0, &quot;count&quot;}
 NR&gt;1{ print $0}&#39; file3

Ho do I get the output I want?

I have tried this from previous/similar question but no joy,

&gt;awk -F, &#39;NR&gt;1{c[$6]++;l[NR&gt;1]=$0}END{for(i=0;i++&lt;NR;){split(l[i],s,&quot;,&quot;);print l[i]&quot;,&quot;c展开收缩]}}&#39; file3
row4_1,row4_2,row4_3,PRE-ASDQF,row4_5,ASDQ,
,
,
,
,
,

very similar question to this one
similar python related Q, for my ref

答案1

得分: 1

你没有正确复制链接问题中的代码。为什么要将 l[NR] 改为 l[NR>1] 呢？另一方面，你应该将 s[1] 改为 s[6]，因为第六个字段包含了你要计数的键：

awk -F, 'NR>1{c[$6]++;l[NR]=$0}END{for(i=0;i++<NR;){split(l[i],s,",");print l[i]", "c

展开收缩

]}}'

你还可以输出具有新字段名称的标题：

awk -F, -vOFS=, 'NR==1{print $0,"count"}NR>1{c[$6]++;l[NR]=$0}END{for(i=0;i++<NR;){split(l[i],s,",");print l[i],c

展开收缩

]}}'

英文:

You did not copy the code from the linked question properly. Why change l[NR] to l[NR>1] at all? On the other hand, you should change s[1] to s[6] since it's the sixth field that has the key you're counting:

awk -F, &#39;NR&gt;1{c[$6]++;l[NR]=$0}END{for(i=0;i++&lt;NR;){split(l[i],s,&quot;,&quot;);print l[i]&quot;,&quot;c展开收缩]}}&#39;

You can also output the header with the new field name:

awk -F, -vOFS=, &#39;NR==1{print $0,&quot;count&quot;}NR&gt;1{c[$6]++;l[NR]=$0}END{for(i=0;i++&lt;NR;){split(l[i],s,&quot;,&quot;);print l[i],c展开收缩]}}&#39;

答案2

得分: 1

我会使用 GNU AWK 来完成这个任务，如下所示，假设 file.txt 的内容如下：

f1,f2,f3,f4,f5,test
row1_1,row1_2,row1_3,SBCDE,row1_5,SBCD
row2_1,row2_2,row2_3,AWERF,row2_5,AWER
row3_1,row3_2,row3_3,ASDFG,row3_5,ASDF
row4_1,row4_2,row4_3,PRE-ASDQG,row4_5,ASDQ
row4_1,row4_2,row4_3,PRE-ASDQF,row4_5,ASDQ

然后运行以下代码：

awk 'BEGIN{FS=OFS=","}NR==1{print $0,"count";next}FNR==NR{arr[$6]+=1;next}FNR>1{print $0,arr[$6]}' file.txt file.txt

输出如下：

f1,f2,f3,f4,f5,test,count
row1_1,row1_2,row1_3,SBCDE,row1_5,SBCD,1
row2_1,row2_2,row2_3,AWERF,row2_5,AWER,1
row3_1,row3_2,row3_3,ASDFG,row3_5,ASDF,1
row4_1,row4_2,row4_3,PRE-ASDQG,row4_5,ASDQ,2
row4_1,row4_2,row4_3,PRE-ASDQF,row4_5,ASDQ,2

解释：这是一个两遍的方法，因此 file.txt 出现了两次。我告诉 GNU AWK , 是字段分隔符 (FS) 也是输出字段分隔符 (OFS)，然后对于第一行（标题行），我将其 print 出来，后跟 count，并指示 GNU AWK 前往下一行，因此对于第一行没有做其他处理。然后在第一遍遍历时，即全局行号 (NR) 等于文件中的行号 (FNR) 时，我统计第六个字段中的值的出现次数，并将它们作为数组 arr 中的值存储，然后指示 GNU AWK 前往下一行，因此在此遍历中没有做其他处理。在第二遍遍历中，对于所有大于 1 的行 (FNR>1)，我 print 整行 ($0)，后跟数组 arr 中相应的值。

(在 GNU Awk 5.0.1 中测试过)

英文:

I would harness GNU AWK for this task following way, let file.txt content be

f1,f2,f3,f4,f5,test
row1_1,row1_2,row1_3,SBCDE,row1_5,SBCD
row2_1,row2_2,row2_3,AWERF,row2_5,AWER
row3_1,row3_2,row3_3,ASDFG,row3_5,ASDF
row4_1,row4_2,row4_3,PRE-ASDQG,row4_5,ASDQ
row4_1,row4_2,row4_3,PRE-ASDQF,row4_5,ASDQ

then

awk &#39;BEGIN{FS=OFS=&quot;,&quot;}NR==1{print $0,&quot;count&quot;;next}FNR==NR{arr[$6]+=1;next}FNR&gt;1{print $0,arr[$6]}&#39; file.txt file.txt

gives output

f1,f2,f3,f4,f5,test,count
row1_1,row1_2,row1_3,SBCDE,row1_5,SBCD,1
row2_1,row2_2,row2_3,AWERF,row2_5,AWER,1
row3_1,row3_2,row3_3,ASDFG,row3_5,ASDF,1
row4_1,row4_2,row4_3,PRE-ASDQG,row4_5,ASDQ,2
row4_1,row4_2,row4_3,PRE-ASDQF,row4_5,ASDQ,2

Explanation: this is two-pass approach, hence file.txt appears twice. I inform GNU AWK that , is both field separator (FS) and output field separator (OFS), then for first line (header) I print it followed by count and instruct GNU AWK to go to next line, so nothing other is done regarding 1st line, then for first pass, i.e. where global number of line (NR) is equal to number of line in file (FNR) I count number of occurences of values in 6th field and store them as values in array arr, then instruct GNU AWK to get to next line, so onthing other is done in this pass. During second pass for all lines after 1st (FNR>1) I print whole line ($0) followed by corresponding value from array arr

(tested in GNU Awk 5.0.1)

答案3

得分: 1

以下是您要求的代码部分的中文翻译：

一种 `awk` 的想法：

    awk '
    BEGIN { FS=OFS="," }                        # 将输入/输出字段分隔符定义为逗号
          { lines[NR]=$0
            if (NR==1) next
            col6[NR]=$6                         # 复制字段6，这样我们就不必在END块中解析lines[]的内容
            cnt[$6]++
          }
    END   { for (i=1;i<=NR;i++)
                print lines[i], (i==1 ? "count" : cnt[col6[i]] )
          }
    ' file3

这将生成：

    f1,f2,f3,f4,f5,test,count
    row1_1,row1_2,row1_3,SBCDE,row1_5,SBCD,1
    row2_1,row2_2,row2_3,AWERF,row2_5,AWER,1
    row3_1,row3_2,row3_3,ASDFG,row3_5,ASDF,1
    row4_1,row4_2,row4_3,PRE-ASDQG,row4_5,ASDQ,2
    row4_1,row4_2,row4_3,PRE-ASDQF,row4_5,ASDQ,2

英文:

One awk idea:

awk &#39;
BEGIN { FS=OFS=&quot;,&quot; }                        # define input/output field delimiters as comma
      { lines[NR]=$0
        if (NR==1) next
        col6[NR]=$6                         # copy field 6 so we do not have to parse the contents of lines[] in the END block
        cnt[$6]++
      }
END   { for (i=1;i&lt;=NR;i++)
            print lines[i], (i==1 ? &quot;count&quot; : cnt[col6[i]] )
      }
&#39; file3

This generates:

f1,f2,f3,f4,f5,test,count
row1_1,row1_2,row1_3,SBCDE,row1_5,SBCD,1
row2_1,row2_2,row2_3,AWERF,row2_5,AWER,1
row3_1,row3_2,row3_3,ASDFG,row3_5,ASDF,1
row4_1,row4_2,row4_3,PRE-ASDQG,row4_5,ASDQ,2
row4_1,row4_2,row4_3,PRE-ASDQF,row4_5,ASDQ,2

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在AWK中带有字段标题的类似Countif的函数

问题

答案1

答案2

答案3

合并 Linux 行，如果列少于 X。

使用awk循环文件并为两列打印新行的方法

如何更新 Awk 命令以使其与更新版本的 MacOS 兼容？

在使用sed插入包含4个制表符的字符串时，变量扩展失败。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论