如何使用Java 8流查找字符串中最频繁的单词?

huangapple go评论59阅读模式
英文:

How to find the most frequent words in a string using java8 streams?

问题

以下是翻译好的部分:

Input:

"Ram is employee of ABC company, ram is from Blore, RAM! is good in algorithms."

Expected Output:

Ram -->3
is -->3

英文:

I have a sample string in below input format. I'm trying to fetch the most repeated word along with it's occurance count as shown in the expected output format. How can we achieve this by using java8 streams api?

Input:

"Ram is employee of ABC company, ram is from Blore, RAM! is good in algorithms."

Expected Output:

Ram -->3
is -->3

答案1

得分: 1

String text = "Ram is employee of ABC company, ram is from Blore, RAM! is good in algorithms.";
List wordsList = Arrays.asList(text.split("[^a-zA-Z0-9]+"));
Map<String, Long> wordFrequency = wordsList.stream().map(word -> word.toLowerCase())
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));

long maxCount = Collections.max(wordFrequency.values());

Map<String, Long> maxFrequencyList = wordFrequency.entrySet().stream().filter(e -> e.getValue() == maxCount)
.collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));

System.out.println(maxFrequencyList);

英文:
	String text = &quot;Ram is employee of ABC company, ram is from Blore, RAM! is good in algorithms.&quot;;
	List&lt;String&gt; wordsList = Arrays.asList(text.split(&quot;[^a-zA-Z0-9]+&quot;));
	Map&lt;String, Long&gt; wordFrequency = wordsList.stream().map(word -&gt; word.toLowerCase())
			.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));

	long maxCount = Collections.max(wordFrequency.values());

	Map&lt;String, Long&gt; maxFrequencyList = wordFrequency.entrySet().stream().filter(e -&gt; e.getValue() == maxCount)
			.collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));

	System.out.println(maxFrequencyList);

答案2

得分: 1

Imo, 使用流对此并不是很有效,因为很难从流中提取和应用可能会改变或不会改变的有用信息(除非你编写自己的收集器)。

此方法使用了 Java 8+ 的映射增强功能,如 mergecomputeIfAbsent。它还计算了单词的频率,包括一次迭代中的并列情况。它通过使用两个映射来实现这一点。

  • individualFrequencies - 一个包含每个单词出现次数的映射,以单词为键。
  • equalFrequencies - 包含具有相同频率的单词的映射,以频率为键。
  • 使用 Map.merge 方法来计算在 Map&lt;String, Integer&gt; 中遇到的每个单词的频率。
  • 另一个映射用于统计具有该频率的所有单词。它声明为 Map&lt;Integer, List&lt;String&gt;&gt;
  • 如果 merge 返回的计数大于或等于 maxCount,那么该单词将被添加到从 equalMaxFrequencies map 获取的列表中,该列表与该计数关联。如果该计数在该计数中不存在,则创建一个新的列表,并将该单词添加到其中。Map.computeIfAbsent 有助于完成此过程。请注意,由于新条目的添加,该映射可能会包含许多过时的垃圾。您想要的最终条目是通过 maxCount 键检索的条目。
String sentence = "Ram is employee of ABC company, ram is from Blore, RAM! is good in algorithms.;";

int maxCount = 0;
Map<String, Integer> individualfrequencies = new HashMap<>();
Map<Integer, List<String>> equalFrequencies = new HashMap<>();

for (String word : sentence.toLowerCase().split("[!;:,.\\s]+")) {
    int count = individualfrequencies.merge(word, 1, Integer::sum);
    if (count >= maxCount) {
        maxCount = count;
        equalFrequencies
                .computeIfAbsent(count, v -> new ArrayList<>())
                .add(word);
    }
}

for (String word : equalFrequencies.get(maxCount)) {
    System.out.printf("%s --> %d%n", word, maxCount);
}

打印结果

ram --> 3
is --> 3

有趣的是,并非所有单词都会出现在 equalFrequencies 映射中。这种行为由单词处理的顺序所决定。一旦一个单词重复,任何随后的单词都不会出现,除非它们要么并列,要么超过当前的 maxCount

英文:

Imo, using streams is not very efficient for this as it is difficult to extract and apply useful information that may or may not change from within the stream (unless you write your own collector).

This method uses Java 8+ map enhancements such as merge and computeIfAbsent. This also computes the frequency of words including ties with one iteration. It does this by using two maps.

  • individualFrequencies - A map of each word's number of occurrences, keyed by the word.
  • equalFrequencies - A map that contains those words that have the same frequencies, keyed by the frequency.
  • the Map.merge method is used to compute the frequency of each word encountered in a Map&lt;String, Integer&gt;
  • the other map is used to tally all the words that have that frequency. It is declared as Map&lt;Integer, List&lt;String&gt;&gt;.
  • if the count returned by merge is greater than or equal to the maxCount, then that word will be added to the list obtained from the equalMaxFrequencies map for that count. If the count doesn't exist for that count, a new list is created and the word is added to that. Map.computeIfAbsent facilitates this process. Note that this map may have lots of outdated garbage as new entries are added. The final entry that one wants is the entry retrieved by the maxCount key.
String sentence = &quot;Ram is employee of ABC company, ram is from Blore, RAM! is good in algorithms.&quot;;

int maxCount = 0;
Map&lt;String, Integer&gt; individualfrequencies = new HashMap&lt;&gt;();
Map&lt;Integer, List&lt;String&gt;&gt; equalFrequencies = new HashMap&lt;&gt;();

for (String word : sentence.toLowerCase().split(&quot;[!;:,.\\s]+&quot;)) {
    int count = individualfrequencies.merge(word, 1, Integer::sum);
    if (count &gt;= maxCount) {
        maxCount = count;
        equalFrequencies
                .computeIfAbsent(count, v -&gt; new ArrayList&lt;&gt;())
                .add(word);
    }
}

for (String word : equalFrequencies.get(maxCount)) {
    System.out.printf(&quot;%s --&gt; %d%n&quot;, word, maxCount);
}

prints

ram --&gt; 3
is --&gt; 3

It's interesting to note that not all words will appear in the equalFrequencies map. This behavior is dictated by the order in which the words are processed. As soon as one word is repeated, any others that follow won't appear unless they either tie or exceed the current maxCount.

huangapple
  • 本文由 发表于 2023年5月24日 18:50:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/76322711.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定