2023年5月24日 18:50:41go评论63阅读模式

英文:

How to find the most frequent words in a string using java8 streams?

问题

以下是翻译好的部分：

Input:

"Ram is employee of ABC company, ram is from Blore, RAM! is good in algorithms."

Expected Output:

Ram -->3
is -->3

英文:

I have a sample string in below input format. I'm trying to fetch the most repeated word along with it's occurance count as shown in the expected output format. How can we achieve this by using java8 streams api?

Input:

&quot;Ram is employee of ABC company, ram is from Blore, RAM! is good in algorithms.&quot;

Expected Output:

Ram --&gt;3
is --&gt;3

答案1

得分: 1

String text = "Ram is employee of ABC company, ram is from Blore, RAM! is good in algorithms.";
List wordsList = Arrays.asList(text.split("[^a-zA-Z0-9]+"));
Map<String, Long> wordFrequency = wordsList.stream().map(word -> word.toLowerCase())
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));

long maxCount = Collections.max(wordFrequency.values());

Map<String, Long> maxFrequencyList = wordFrequency.entrySet().stream().filter(e -> e.getValue() == maxCount)
.collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));

System.out.println(maxFrequencyList);

英文:

	String text = &quot;Ram is employee of ABC company, ram is from Blore, RAM! is good in algorithms.&quot;;
	List&lt;String&gt; wordsList = Arrays.asList(text.split(&quot;[^a-zA-Z0-9]+&quot;));
	Map&lt;String, Long&gt; wordFrequency = wordsList.stream().map(word -&gt; word.toLowerCase())
			.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));

	long maxCount = Collections.max(wordFrequency.values());

	Map&lt;String, Long&gt; maxFrequencyList = wordFrequency.entrySet().stream().filter(e -&gt; e.getValue() == maxCount)
			.collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));

	System.out.println(maxFrequencyList);

答案2

得分: 1

Imo, 使用流对此并不是很有效，因为很难从流中提取和应用可能会改变或不会改变的有用信息（除非你编写自己的收集器）。

此方法使用了 Java 8+ 的映射增强功能，如 merge 和 computeIfAbsent。它还计算了单词的频率，包括一次迭代中的并列情况。它通过使用两个映射来实现这一点。

individualFrequencies - 一个包含每个单词出现次数的映射，以单词为键。
equalFrequencies - 包含具有相同频率的单词的映射，以频率为键。
使用 Map.merge 方法来计算在 Map<String, Integer> 中遇到的每个单词的频率。
另一个映射用于统计具有该频率的所有单词。它声明为 Map<Integer, List<String>>。
如果 merge 返回的计数大于或等于 maxCount，那么该单词将被添加到从 equalMaxFrequencies map 获取的列表中，该列表与该计数关联。如果该计数在该计数中不存在，则创建一个新的列表，并将该单词添加到其中。Map.computeIfAbsent 有助于完成此过程。请注意，由于新条目的添加，该映射可能会包含许多过时的垃圾。您想要的最终条目是通过 maxCount 键检索的条目。

String sentence = "Ram is employee of ABC company, ram is from Blore, RAM! is good in algorithms.;";

int maxCount = 0;
Map<String, Integer> individualfrequencies = new HashMap<>();
Map<Integer, List<String>> equalFrequencies = new HashMap<>();

for (String word : sentence.toLowerCase().split("[!;:,.\\s]+")) {
    int count = individualfrequencies.merge(word, 1, Integer::sum);
    if (count >= maxCount) {
        maxCount = count;
        equalFrequencies
                .computeIfAbsent(count, v -> new ArrayList<>())
                .add(word);
    }
}

for (String word : equalFrequencies.get(maxCount)) {
    System.out.printf("%s --> %d%n", word, maxCount);
}

打印结果

ram --> 3
is --> 3

有趣的是，并非所有单词都会出现在 equalFrequencies 映射中。这种行为由单词处理的顺序所决定。一旦一个单词重复，任何随后的单词都不会出现，除非它们要么并列，要么超过当前的 maxCount。

英文:

Imo, using streams is not very efficient for this as it is difficult to extract and apply useful information that may or may not change from within the stream (unless you write your own collector).

This method uses Java 8+ map enhancements such as merge and computeIfAbsent. This also computes the frequency of words including ties with one iteration. It does this by using two maps.

individualFrequencies - A map of each word's number of occurrences, keyed by the word.
equalFrequencies - A map that contains those words that have the same frequencies, keyed by the frequency.
the Map.merge method is used to compute the frequency of each word encountered in a Map<String, Integer>
the other map is used to tally all the words that have that frequency. It is declared as Map<Integer, List<String>>.
if the count returned by merge is greater than or equal to the maxCount, then that word will be added to the list obtained from the equalMaxFrequencies map for that count. If the count doesn't exist for that count, a new list is created and the word is added to that. Map.computeIfAbsent facilitates this process. Note that this map may have lots of outdated garbage as new entries are added. The final entry that one wants is the entry retrieved by the maxCount key.

String sentence = &quot;Ram is employee of ABC company, ram is from Blore, RAM! is good in algorithms.&quot;;

int maxCount = 0;
Map&lt;String, Integer&gt; individualfrequencies = new HashMap&lt;&gt;();
Map&lt;Integer, List&lt;String&gt;&gt; equalFrequencies = new HashMap&lt;&gt;();

for (String word : sentence.toLowerCase().split(&quot;[!;:,.\\s]+&quot;)) {
    int count = individualfrequencies.merge(word, 1, Integer::sum);
    if (count &gt;= maxCount) {
        maxCount = count;
        equalFrequencies
                .computeIfAbsent(count, v -&gt; new ArrayList&lt;&gt;())
                .add(word);
    }
}

for (String word : equalFrequencies.get(maxCount)) {
    System.out.printf(&quot;%s --&gt; %d%n&quot;, word, maxCount);
}

prints

ram --&gt; 3
is --&gt; 3

It's interesting to note that not all words will appear in the equalFrequencies map. This behavior is dictated by the order in which the words are processed. As soon as one word is repeated, any others that follow won't appear unless they either tie or exceed the current maxCount.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何使用Java 8流查找字符串中最频繁的单词？

问题

Input:

Expected Output:

Input:

Expected Output:

答案1

答案2

这个方法引用是如何有效的？

Java8 Stream: 在流(Stream)内部使用if/else条件

使用Java中的流（Streams）获取用户输入值并对其进行处理。

Java 8的Optional<List>返回True，为什么？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论