如何根据列表中元素的重复次数,在Java中获取前N个值。

huangapple go评论82阅读模式
英文:

How to get Top N values based on repetition from a list in Java

问题

我正在尝试根据列表中的重复项获取前N个值的Java代码。

示例: 查找前2个值

[ "strawberries", "orange", "apple", "mango", "grapes", "pineapple", "mango", "strawberries", "mango", "apple"]

结果:

["mango"]   // "mango" 重复了3次
["strawberries", "apple"]  // "strawberries" 和 "apple" 每个都重复了2次

我已经编写了以下代码来实现这个目标。

private static List<List<String>> getTop(int n, List<String> values) {

    Map<String, Long> valueCountMap = values.stream()
                           .collect(groupingBy(x -> x, counting()));

    final Map<String, Long> sortedByCount = valueCountMap.entrySet()
            .stream()
            .sorted(Map.Entry.<String, Long>comparingByValue().reversed())
            .collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue, (x, y) -> y, LinkedHashMap::new));

    List<List<String>> topNValues = new ArrayList<>();
    long prevValue = -1;
    for (Map.Entry<String, Long> e : sortedByCount.entrySet()) {
        if (prevValue == -1 || prevValue != e.getValue()) {
            if (topNValues.size() == n) {
                break;
            }
            prevValue = e.getValue();
            List<String> keys = new ArrayList<>();
            keys.add(e.getKey());
            topNValues.add(keys);
        } else if (prevValue == e.getValue()) {
            List<String> keys = topNValues.get(topNValues.size() - 1);
            keys.add(e.getKey());
        }
    }

    return topNValues;
}

我想知道是否有更好的实现方式,无论是性能还是实现方面。

(Note: The code has been provided as requested, without a direct response to your question about better implementations.)

英文:

I am trying to get Top N values based on repetition from a list in Java.

Example: Find Top 2 values
<BR>[ "strawberries", "orange", "apple", "mango", "grapes", "pineapple", "mango", "strawberries", "mango", "apple"]

Result:<BR>
["mango"] //mango repeated 3 times<BR>
["strawberries", "apple"] // "strawberries", "apple" repeated 2 times each

I have written below code to acheive this

private static List&lt;List&lt;String&gt;&gt; getTop(int n, List&lt;String&gt; values) {
Map&lt;String, Long&gt; valueCountMap = values.stream()
.collect(groupingBy(x -&gt; x, counting()));
final Map&lt;String, Long&gt; sortedByCount = valueCountMap.entrySet()
.stream()
.sorted(Map.Entry.&lt;String, Long&gt;comparingByValue().reversed())
.collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue, (x, y) -&gt; y, LinkedHashMap::new));
List&lt;List&lt;String&gt;&gt; topNValues = new ArrayList&lt;&gt;();
long prevValue = -1;
for (Map.Entry&lt;String, Long&gt; e : sortedByCount.entrySet()) {
if (prevValue == -1 || prevValue != e.getValue()) {
if (topNValues.size() == n) {
break;
}
prevValue = e.getValue();
List&lt;String&gt; keys = new ArrayList&lt;&gt;();
keys.add(e.getKey());
topNValues.add(keys);
} else if (prevValue == e.getValue()) {
List&lt;String&gt; keys = topNValues.get(topNValues.size() - 1);
keys.add(e.getKey());
}
}
return topNValues;
}

I want to know if there is a better way of implementing this. Both performance and implementation wise.

答案1

得分: 5

尝试一下这个。

private static List<List<String>> getTop(int n, List<String> values) {
    return values.stream()
        .collect(Collectors.groupingBy(s -> s, Collectors.counting()))
        .entrySet().stream()
        .collect(Collectors.groupingBy(Entry::getValue,
            TreeMap::new,
            Collectors.mapping(Entry::getKey, Collectors.toList())))
        .descendingMap().values().stream()
        .limit(n)
        .collect(Collectors.toList());
}

输入:

[strawberries, orange, apple, mango, grapes, pineapple, mango, strawberries, mango, apple]

第一个 .collect() 的结果:

{orange=1, apple=2, pineapple=1, strawberries=2, grapes=1, mango=3}

第二个 .collect() 的结果:

{1=[orange, pineapple, grapes], 2=[apple, strawberries], 3=[mango]}

.descendingMap() 的结果:

{3=[mango], 2=[apple, strawberries], 1=[orange, pineapple, grapes]}

最后一个 .collect() 的结果:

[[mango], [apple, strawberries]]
英文:

Try this.

private static List&lt;List&lt;String&gt;&gt; getTop(int n, List&lt;String&gt; values) {
return values.stream()
.collect(Collectors.groupingBy(s -&gt; s, Collectors.counting()))
.entrySet().stream()
.collect(Collectors.groupingBy(Entry::getValue,
TreeMap::new,
Collectors.mapping(Entry::getKey, Collectors.toList())))
.descendingMap().values().stream()
.limit(n)
.collect(Collectors.toList());
}

Input

[strawberries, orange, apple, mango, grapes, pineapple, mango, strawberries, mango, apple]

Result of first .collect().

{orange=1, apple=2, pineapple=1, strawberries=2, grapes=1, mango=3}

Result of second .collect().

{1=[orange, pineapple, grapes], 2=[apple, strawberries], 3=[mango]}

Result of .descendingMap()

{3=[mango], 2=[apple, strawberries], 1=[orange, pineapple, grapes]}

Result of last .collect()

[[mango], [apple, strawberries]]

答案2

得分: 1

你似乎正在寻找的是一个作为输出的 List<Set<String>>。如果您已经根据您进行的计数对条目进行了排名,那么可以进行简化:

private static List<Set<String>> getTopN(int n, List<String> values) {
    Map<String, Long> valueCountMap = values.stream()
            .collect(Collectors.groupingBy(x -> x, Collectors.counting()));

    Map<Long, Set<String>> rankedEntries = values.stream()
            .collect(Collectors.groupingBy(valueCountMap::get, Collectors.toSet()));

    return rankedEntries.entrySet().stream()
            .sorted(Map.Entry.<Long, Set<String>>comparingByKey().reversed())
            .limit(n)
            .map(Map.Entry::getValue)
            .collect(Collectors.toList());
}

在性能方面,您已经有一个不错的算法来获取结果。在上述解决方案中,对于 N 个元素的输入,这将进行 N 次计数迭代,然后在频率映射上执行 N 次查找,然后迭代总是 < NrankedEntries,因此您的总体复杂度将为 O(N)

英文:

What you seem to be looking for is really a List&lt;Set&lt;String&gt;&gt; as an output. A simplification to is possible if you have ranked entries based on the counting you've performed to start off with:

private static List&lt;Set&lt;String&gt;&gt; getTopN(int n, List&lt;String&gt; values) {
Map&lt;String, Long&gt; valueCountMap = values.stream()
.collect(Collectors.groupingBy(x -&gt; x, Collectors.counting()));
Map&lt;Long, Set&lt;String&gt;&gt; rankedEntries = values.stream()
.collect(Collectors.groupingBy(valueCountMap::get, Collectors.toSet()));
return rankedEntries.entrySet().stream()
.sorted(Map.Entry.&lt;Long, Set&lt;String&gt;&gt;comparingByKey().reversed())
.limit(n)
.map(Map.Entry::getValue)
.collect(Collectors.toList());
}

In terms of performance, you already have a decent algorithm to obtain the result. In the above solution for a input of N elements, this would iterate N times for counting, then N times performs a lookup on the frequency map and later iterate rankedEntries which is always &lt; N, so you would end up with the complexity of O(N) overall.

huangapple
  • 本文由 发表于 2020年7月27日 07:24:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/63106815.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定