获取表中某列上的行的迭代次数和总计数。

huangapple go评论67阅读模式
英文:

to get iteration count and total count over the rows from a column in table

问题

We have a database table (source) with some million records in a table: Sample data like (extracted to a text file for sample)

 denied the payment
 the payment successful and incident reported successful
 Incident is been reported

While trying to get the distinct word count on these 3 records, we have replaced spaces with a new line character and then sorted and uniqued them.

sed 's/ /\n/g' file | sort | uniq -c >> new.txt 
Output: 
denied        1
the           2
payment       2
successful    2
Incident      2
is            1
been          1
reported      2

How can we also get the number of rows for the above output, something like:

values   iteration count   count of rows

denied         1              1
the            2              2
payment        2              2 
successful     2              1  (Although this word is two times but available only in 1 row)
Incident       2              2
is             1              1
been           1              1
reported       2              2
英文:

We have a database table(source) with some million records in a table : Sample data like (extracted to a text file for sample)

 denied the payment
 the payment successfull and incident reported successfull
 Incident is been reported

while trying to get the distint words count on these 3 records . have replaced space with new line character and then sort uniq we have done .

sed 's/ /\n/g' file|sort|uniq -c >> new.txt 
output: 
denied   1
the      2
payment  2
successfull 2
Incident 2
is       1
been     1
reported 2

how can we also get number of rows for the above output: some thing like

values iterationcount    countofrows

denied       		1   	1
the          		2   	2
payment      		2   	2 
successfull  		2   	1  (Although this word is two times but available only in 1 row )
Incident     		2   	2
is           		1   	1
been         		1   	1
reported     		2   	2

答案1

得分: 2

以下是您要翻译的代码部分:

BEGIN { OFS="\t" }
{
    delete seen
    for ( i=1; i<=NF; i++ ) {
        wordCnt[$i]++
        if ( !seen[$i]++ ) {
            rowCnt[$i]++
        }
    }
}
END {
    print "values", "iterationcount", "countofrows"
    for ( word in wordCnt ) {
        print word, wordCnt[word], rowCnt[word]
    }
}
$ awk -f tst.awk file | column -s $'\t' -t
values       iterationcount  countofrows
payment      2               2
incident     1               1
the          2               2
and          1               1
been         1               1
reported     2               2
successfull  2               1
Incident     1               1
is           1               1
denied       1               1
英文:

You can't start by converting blanks to newlines and then try to add getting counts per the original lines.

Using any awk that supports delete array (which is most of them):

$ cat tst.awk
BEGIN { OFS=&quot;\t&quot; }
{
    delete seen
    for ( i=1; i&lt;=NF; i++ ) {
        wordCnt[$i]++
        if ( !seen[$i]++ ) {
            rowCnt[$i]++
        }
    }
}
END {
    print &quot;values&quot;, &quot;iterationcount&quot;, &quot;countofrows&quot;
    for ( word in wordCnt ) {
        print word, wordCnt[word], rowCnt[word]
    }
}

<p>

$ awk -f tst.awk file | column -s $&#39;\t&#39; -t
values       iterationcount  countofrows
payment      2               2
incident     1               1
the          2               2
and          1               1
been         1               1
reported     2               2
successfull  2               1
Incident     1               1
is           1               1
denied       1               1

huangapple
  • 本文由 发表于 2023年4月13日 22:09:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/76006442.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定