使用Java8计算字符串列表中每个单词的频率。

huangapple go评论81阅读模式
英文:

Count frequency of each word from list of Strings using Java8

问题

import java.util.*;
import java.util.stream.Collectors;
import java.util.stream.Stream;

public class StringOccurrencesMap {
    public static void main(String[] args) {
        String[] listA = {"the", "you", "how"};
        String[] listB = {"the dog ate the food", "how is the weather", "how are you"};

        Set<String> sentenceSet = Stream.of(listB).collect(Collectors.toSet());

        Map<String, Long> frequency1 = Stream.of(listA)
            .filter(e -> sentenceSet.contains(e))
            .collect(Collectors.groupingBy(t -> t, Collectors.counting()));

        System.out.println(frequency1);
    }
}

Note: This is the provided code snippet translated into Chinese. If you need any further assistance or modifications, please let me know.

英文:

I have two lists of Strings. Need to create a map of occurrences of each string of one list in another list of string. If a String is present even more than in a single string, it should be counted as one occurrence.

For example:

String[] listA={&quot;the&quot;, &quot;you&quot; , &quot;how&quot;}; 
String[] listB = {&quot;the dog ate the food&quot;, &quot;how is the weather&quot; , &quot;how are you&quot;};

The Map&lt;String, Integer&gt; map will take keys as Strings from listA, and value as the occurence. So map will have key-values as : (&quot;the&quot;,2)(&quot;you&quot;,1)(&quot;how&quot;,2).

Note: Though &quot;the&quot; is repeated twice in &quot;the dog ate the food&quot;, it counted as only one occurrence as it is in the same string.

How do I write this using [tag:java-stream]? I tried this approach but does not work:

Set&lt;String&gt; sentenceSet = Stream.of(listB).collect(Collectors.toSet());
		
Map&lt;String, Long&gt; frequency1 =	Stream.of(listA)
    .filter(e -&gt; sentenceSet.contains(e))
    .collect(Collectors.groupingBy(t -&gt; t, Collectors.counting()));

答案1

得分: 2

你需要从listB中提取所有单词,并仅保留那些也在listA中列出的单词。然后,你只需将单词 -> 计数对收集到Map&lt;String,Long&gt;中:

String[] listA = {"the", "you", "how"};
String[] listB = {"the dog ate the food", "how is the weather", "how are you"};

Set<String> qualified = new HashSet<>(Arrays.asList(listA));   // 使搜索更简便

Map<String, Long> map = Arrays.stream(listB)   // 将句子转换为流
    .map(sentence -> sentence.split("\\s+"))   // 按单词拆分为流<String[]>
    .flatMap(words -> Arrays.stream(words)     // flatmap为流<String>
                            .distinct())       // ...作为句子的不同单词
    .filter(qualified::contains)               // 仅保留合格的单词
    .collect(Collectors.groupingBy(            // 收集到Map中
        Function.identity(),                   // ...键是单词本身
        Collectors.counting()));               // ...值是其频率

输出:

> {the=2, how=2, you=1}
英文:

You need to extract all the words from listB and keep only these that are also listed in listA. Then you simply collect the pairs word -> count to the Map&lt;String, Long&gt;:

String[] listA={&quot;the&quot;, &quot;you&quot;, &quot;how&quot;};
String[] listB = {&quot;the dog ate the food&quot;, &quot;how is the weather&quot; , &quot;how are you&quot;};

Set&lt;String&gt; qualified = new HashSet&lt;&gt;(Arrays.asList(listA));   // make searching easier

Map&lt;String, Long&gt; map = Arrays.stream(listB)   // stream the sentences
    .map(sentence -&gt; sentence.split(&quot;\\s+&quot;))   // split by words to Stream&lt;String[]&gt;
    .flatMap(words -&gt; Arrays.stream(words)     // flatmap to Stream&lt;String&gt;
                            .distinct())       // ... as distinct words by sentence
    .filter(qualified::contains)               // keep only the qualified words
    .collect(Collectors.groupingBy(            // collect to the Map
        Function.identity(),                   // ... the key is the words itself
        Collectors.counting()));               // ... the value is its frequency

Output:

> {the=2, how=2, you=1}

答案2

得分: 0

建议您在第一个字符串中创建一个哈希表。然后循环遍历第二个列表中的项目,检查它是否在哈希表中。在添加第一个列表中的元素时,测试是否已经存在,然后决定是否要保留计数。您可以将一个单词所在的句子存储为键的值,例如。

英文:

Suggest you create a hash table of the items in the first string. Then loop through the items in the second list checking if it is in the hash table or not. When adding the elements in the first list, test to see if it’s already there and decide if you want to keep a count or not. You can store which sentence a word is in as the value for the key, for instance.

huangapple
  • 本文由 发表于 2020年4月7日 18:21:26
  • 转载请务必保留本文链接:https://go.coder-hub.com/61077794.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定