英文:
to get iteration count and total count over the rows from a column in table
问题
We have a database table (source) with some million records in a table: Sample data like (extracted to a text file for sample)
denied the payment
the payment successful and incident reported successful
Incident is been reported
While trying to get the distinct word count on these 3 records, we have replaced spaces with a new line character and then sorted and uniqued them.
sed 's/ /\n/g' file | sort | uniq -c >> new.txt
Output:
denied 1
the 2
payment 2
successful 2
Incident 2
is 1
been 1
reported 2
How can we also get the number of rows for the above output, something like:
values iteration count count of rows
denied 1 1
the 2 2
payment 2 2
successful 2 1 (Although this word is two times but available only in 1 row)
Incident 2 2
is 1 1
been 1 1
reported 2 2
英文:
We have a database table(source) with some million records in a table : Sample data like (extracted to a text file for sample)
denied the payment
the payment successfull and incident reported successfull
Incident is been reported
while trying to get the distint words count on these 3 records . have replaced space with new line character and then sort uniq we have done .
sed 's/ /\n/g' file|sort|uniq -c >> new.txt
output:
denied 1
the 2
payment 2
successfull 2
Incident 2
is 1
been 1
reported 2
how can we also get number of rows for the above output: some thing like
values iterationcount countofrows
denied 1 1
the 2 2
payment 2 2
successfull 2 1 (Although this word is two times but available only in 1 row )
Incident 2 2
is 1 1
been 1 1
reported 2 2
答案1
得分: 2
以下是您要翻译的代码部分:
BEGIN { OFS="\t" }
{
delete seen
for ( i=1; i<=NF; i++ ) {
wordCnt[$i]++
if ( !seen[$i]++ ) {
rowCnt[$i]++
}
}
}
END {
print "values", "iterationcount", "countofrows"
for ( word in wordCnt ) {
print word, wordCnt[word], rowCnt[word]
}
}
$ awk -f tst.awk file | column -s $'\t' -t
values iterationcount countofrows
payment 2 2
incident 1 1
the 2 2
and 1 1
been 1 1
reported 2 2
successfull 2 1
Incident 1 1
is 1 1
denied 1 1
英文:
You can't start by converting blanks to newlines and then try to add getting counts per the original lines.
Using any awk that supports delete array
(which is most of them):
$ cat tst.awk
BEGIN { OFS="\t" }
{
delete seen
for ( i=1; i<=NF; i++ ) {
wordCnt[$i]++
if ( !seen[$i]++ ) {
rowCnt[$i]++
}
}
}
END {
print "values", "iterationcount", "countofrows"
for ( word in wordCnt ) {
print word, wordCnt[word], rowCnt[word]
}
}
<p>
$ awk -f tst.awk file | column -s $'\t' -t
values iterationcount countofrows
payment 2 2
incident 1 1
the 2 2
and 1 1
been 1 1
reported 2 2
successfull 2 1
Incident 1 1
is 1 1
denied 1 1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论