问题

We have a database table (source) with some million records in a table: Sample data like (extracted to a text file for sample)

 denied the payment
 the payment successful and incident reported successful
 Incident is been reported

While trying to get the distinct word count on these 3 records, we have replaced spaces with a new line character and then sorted and uniqued them.

sed 's/ /\n/g' file | sort | uniq -c >> new.txt 
Output: 
denied        1
the           2
payment       2
successful    2
Incident      2
is            1
been          1
reported      2

How can we also get the number of rows for the above output, something like:

values   iteration count   count of rows

denied         1              1
the            2              2
payment        2              2 
successful     2              1  (Although this word is two times but available only in 1 row)
Incident       2              2
is             1              1
been           1              1
reported       2              2

英文:

We have a database table(source) with some million records in a table : Sample data like (extracted to a text file for sample)

 denied the payment
 the payment successfull and incident reported successfull
 Incident is been reported

while trying to get the distint words count on these 3 records . have replaced space with new line character and then sort uniq we have done .

sed &#39;s/ /\n/g&#39; file|sort|uniq -c &gt;&gt; new.txt 
output: 
denied   1
the      2
payment  2
successfull 2
Incident 2
is       1
been     1
reported 2

how can we also get number of rows for the above output: some thing like

values iterationcount    countofrows

denied       		1   	1
the          		2   	2
payment      		2   	2 
successfull  		2   	1  (Although this word is two times but available only in 1 row )
Incident     		2   	2
is           		1   	1
been         		1   	1
reported     		2   	2

答案1

得分: 2

以下是您要翻译的代码部分：

BEGIN { OFS="\t" }
{
    delete seen
    for ( i=1; i<=NF; i++ ) {
        wordCnt[$i]++
        if ( !seen[$i]++ ) {
            rowCnt[$i]++
        }
    }
}
END {
    print "values", "iterationcount", "countofrows"
    for ( word in wordCnt ) {
        print word, wordCnt[word], rowCnt[word]
    }
}

$ awk -f tst.awk file | column -s $'\t' -t
values       iterationcount  countofrows
payment      2               2
incident     1               1
the          2               2
and          1               1
been         1               1
reported     2               2
successfull  2               1
Incident     1               1
is           1               1
denied       1               1

英文:

You can't start by converting blanks to newlines and then try to add getting counts per the original lines.

Using any awk that supports delete array (which is most of them):

$ cat tst.awk
BEGIN { OFS=&quot;\t&quot; }
{
    delete seen
    for ( i=1; i&lt;=NF; i++ ) {
        wordCnt[$i]++
        if ( !seen[$i]++ ) {
            rowCnt[$i]++
        }
    }
}
END {
    print &quot;values&quot;, &quot;iterationcount&quot;, &quot;countofrows&quot;
    for ( word in wordCnt ) {
        print word, wordCnt[word], rowCnt[word]
    }
}

<p>

$ awk -f tst.awk file | column -s $&#39;\t&#39; -t
values       iterationcount  countofrows
payment      2               2
incident     1               1
the          2               2
and          1               1
been         1               1
reported     2               2
successfull  2               1
Incident     1               1
is           1               1
denied       1               1

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

获取表中某列上的行的迭代次数和总计数。

问题

答案1

Boto3的KeyConditionExpression有效，但get_item无效。

如何在迭代后更新数据框中的分组。

在这个UML图中，排序集（sortedSet）应该放在哪里？

Why am I seeing an 'AttributeError' when using the Python Smartsheet SDK to create a new sheet in a folder?

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论