英文:
How to get Top N values based on repetition from a list in Java
问题
我正在尝试根据列表中的重复项获取前N个值的Java代码。
示例: 查找前2个值
[ "strawberries", "orange", "apple", "mango", "grapes", "pineapple", "mango", "strawberries", "mango", "apple"]
结果:
["mango"] // "mango" 重复了3次
["strawberries", "apple"] // "strawberries" 和 "apple" 每个都重复了2次
我已经编写了以下代码来实现这个目标。
private static List<List<String>> getTop(int n, List<String> values) {
Map<String, Long> valueCountMap = values.stream()
.collect(groupingBy(x -> x, counting()));
final Map<String, Long> sortedByCount = valueCountMap.entrySet()
.stream()
.sorted(Map.Entry.<String, Long>comparingByValue().reversed())
.collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue, (x, y) -> y, LinkedHashMap::new));
List<List<String>> topNValues = new ArrayList<>();
long prevValue = -1;
for (Map.Entry<String, Long> e : sortedByCount.entrySet()) {
if (prevValue == -1 || prevValue != e.getValue()) {
if (topNValues.size() == n) {
break;
}
prevValue = e.getValue();
List<String> keys = new ArrayList<>();
keys.add(e.getKey());
topNValues.add(keys);
} else if (prevValue == e.getValue()) {
List<String> keys = topNValues.get(topNValues.size() - 1);
keys.add(e.getKey());
}
}
return topNValues;
}
我想知道是否有更好的实现方式,无论是性能还是实现方面。
(Note: The code has been provided as requested, without a direct response to your question about better implementations.)
英文:
I am trying to get Top N values based on repetition from a list in Java.
Example: Find Top 2 values
<BR>[ "strawberries", "orange", "apple", "mango", "grapes", "pineapple", "mango", "strawberries", "mango", "apple"]
Result:<BR>
["mango"] //mango repeated 3 times<BR>
["strawberries", "apple"] // "strawberries", "apple" repeated 2 times each
I have written below code to acheive this
private static List<List<String>> getTop(int n, List<String> values) {
Map<String, Long> valueCountMap = values.stream()
.collect(groupingBy(x -> x, counting()));
final Map<String, Long> sortedByCount = valueCountMap.entrySet()
.stream()
.sorted(Map.Entry.<String, Long>comparingByValue().reversed())
.collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue, (x, y) -> y, LinkedHashMap::new));
List<List<String>> topNValues = new ArrayList<>();
long prevValue = -1;
for (Map.Entry<String, Long> e : sortedByCount.entrySet()) {
if (prevValue == -1 || prevValue != e.getValue()) {
if (topNValues.size() == n) {
break;
}
prevValue = e.getValue();
List<String> keys = new ArrayList<>();
keys.add(e.getKey());
topNValues.add(keys);
} else if (prevValue == e.getValue()) {
List<String> keys = topNValues.get(topNValues.size() - 1);
keys.add(e.getKey());
}
}
return topNValues;
}
I want to know if there is a better way of implementing this. Both performance and implementation wise.
答案1
得分: 5
尝试一下这个。
private static List<List<String>> getTop(int n, List<String> values) {
return values.stream()
.collect(Collectors.groupingBy(s -> s, Collectors.counting()))
.entrySet().stream()
.collect(Collectors.groupingBy(Entry::getValue,
TreeMap::new,
Collectors.mapping(Entry::getKey, Collectors.toList())))
.descendingMap().values().stream()
.limit(n)
.collect(Collectors.toList());
}
输入:
[strawberries, orange, apple, mango, grapes, pineapple, mango, strawberries, mango, apple]
第一个 .collect()
的结果:
{orange=1, apple=2, pineapple=1, strawberries=2, grapes=1, mango=3}
第二个 .collect()
的结果:
{1=[orange, pineapple, grapes], 2=[apple, strawberries], 3=[mango]}
.descendingMap()
的结果:
{3=[mango], 2=[apple, strawberries], 1=[orange, pineapple, grapes]}
最后一个 .collect()
的结果:
[[mango], [apple, strawberries]]
英文:
Try this.
private static List<List<String>> getTop(int n, List<String> values) {
return values.stream()
.collect(Collectors.groupingBy(s -> s, Collectors.counting()))
.entrySet().stream()
.collect(Collectors.groupingBy(Entry::getValue,
TreeMap::new,
Collectors.mapping(Entry::getKey, Collectors.toList())))
.descendingMap().values().stream()
.limit(n)
.collect(Collectors.toList());
}
Input
[strawberries, orange, apple, mango, grapes, pineapple, mango, strawberries, mango, apple]
Result of first .collect()
.
{orange=1, apple=2, pineapple=1, strawberries=2, grapes=1, mango=3}
Result of second .collect()
.
{1=[orange, pineapple, grapes], 2=[apple, strawberries], 3=[mango]}
Result of .descendingMap()
{3=[mango], 2=[apple, strawberries], 1=[orange, pineapple, grapes]}
Result of last .collect()
[[mango], [apple, strawberries]]
答案2
得分: 1
你似乎正在寻找的是一个作为输出的 List<Set<String>>
。如果您已经根据您进行的计数对条目进行了排名,那么可以进行简化:
private static List<Set<String>> getTopN(int n, List<String> values) {
Map<String, Long> valueCountMap = values.stream()
.collect(Collectors.groupingBy(x -> x, Collectors.counting()));
Map<Long, Set<String>> rankedEntries = values.stream()
.collect(Collectors.groupingBy(valueCountMap::get, Collectors.toSet()));
return rankedEntries.entrySet().stream()
.sorted(Map.Entry.<Long, Set<String>>comparingByKey().reversed())
.limit(n)
.map(Map.Entry::getValue)
.collect(Collectors.toList());
}
在性能方面,您已经有一个不错的算法来获取结果。在上述解决方案中,对于 N 个元素的输入,这将进行 N
次计数迭代,然后在频率映射上执行 N
次查找,然后迭代总是 < N
的 rankedEntries
,因此您的总体复杂度将为 O(N)
。
英文:
What you seem to be looking for is really a List<Set<String>>
as an output. A simplification to is possible if you have ranked entries based on the counting you've performed to start off with:
private static List<Set<String>> getTopN(int n, List<String> values) {
Map<String, Long> valueCountMap = values.stream()
.collect(Collectors.groupingBy(x -> x, Collectors.counting()));
Map<Long, Set<String>> rankedEntries = values.stream()
.collect(Collectors.groupingBy(valueCountMap::get, Collectors.toSet()));
return rankedEntries.entrySet().stream()
.sorted(Map.Entry.<Long, Set<String>>comparingByKey().reversed())
.limit(n)
.map(Map.Entry::getValue)
.collect(Collectors.toList());
}
In terms of performance, you already have a decent algorithm to obtain the result. In the above solution for a input of N elements, this would iterate N
times for counting, then N
times performs a lookup on the frequency map and later iterate rankedEntries
which is always < N
, so you would end up with the complexity of O(N)
overall.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论