2020年4月3日 20:01:04go评论80阅读模式

英文:

simplify java stream to find duplicate properties

问题

我有一个users列表，我想找出所有具有重复名称的用户：

var duplicateNames = users
                    .stream()
                    .collect(Collectors.groupingBy(u -> u.getName()))
                    .entrySet()
                    .stream()
                    .filter(entry -> entry.getValue().size() > 1)
                    .map(Map.Entry::getKey)
                    .collect(Collectors.toSet());

这样可以改进/简化上面的解决方案吗？

例如，实际上我创建了一个包含所有名称的列表，然后对其进行筛选。如何在不创建额外的allNames列表的情况下遍历列表以查找其中重复的名称呢？

英文:

I have a users list and I want to find all users having duplicate names:

var allNames = users
              .stream()
              .map(u -&gt; u.getName()).collect(Collectors.toList());

var duplicateNames = allNames
                .stream()
				.filter(i -&gt; Collections.frequency(allNames, i) &gt; 1)
				.collect(Collectors.toSet());

Can I improve/simplify the above solution?

For example, actually I create a list with all names and then filter it. How can I traverse the list to find its duplicate names without creating the additional list allNames?

答案1

得分: 7

一个解决方案是：

var duplicate = users.stream()
    .collect(Collectors.toMap(User::getName, u -> false, (x, y) -> true))
    .entrySet().stream()
    .filter(Map.Entry::getValue)
    .map(Map.Entry::getKey)
    .collect(Collectors.toSet());

这会创建一个中间的 Map<String, Boolean>，用于记录哪些名称出现超过一次。您可以使用该映射的 keySet()，而不是收集到新的 Set：

var duplicate = users.stream()
    .collect(Collectors.collectingAndThen(
        Collectors.toMap(User::getName, u -> false, (x, y) -> true, HashMap::new),
        m -> {
            m.values().removeIf(dup -> !dup);
            return m.keySet();
        }));

一个循环解决方案可能会更简单：

HashSet<String> seen = new HashSet<>(), duplicate = new HashSet<>();
for (User u : users)
    if (!seen.add(u.getName())) duplicate.add(u.getName());

英文:

One solution is

var duplicate = users.stream()
    .collect(Collectors.toMap(User::getName, u -&gt; false, (x,y) -&gt; true))
    .entrySet().stream()
    .filter(Map.Entry::getValue)
    .map(Map.Entry::getKey)
    .collect(Collectors.toSet());

This creates an intermediate Map<String,Boolean> to record which name is occurring more than once. You could use the keySet() of that map instead of collecting to a new Set:

var duplicate = users.stream()
    .collect(Collectors.collectingAndThen(
        Collectors.toMap(User::getName, u -&gt; false, (x,y) -&gt; true, HashMap::new),
            m -&gt; {
                m.values().removeIf(dup -&gt; !dup);
                return m.keySet();
            }));

A loop solution can be much simpler:

HashSet&lt;String&gt; seen = new HashSet&lt;&gt;(), duplicate = new HashSet&lt;&gt;();
for(User u: users)
    if(!seen.add(u.getName())) duplicate.add(u.getName());

答案2

得分: 2

按照姓名分组，找出拥有多个值的条目：

Map<String, List<User>> grouped = users.stream()
    .collect(Collectors.groupingBy(User::getName));

List<User> duplicated =
    grouped.values().stream()
        .filter(v -> v.size() > 1)
        .flatMap(List::stream)
        .collect(Collectors.toList());

（如果你愿意，你可以将这个过程合并成一个表达式。我只是将步骤分开，以便更清楚地解释发生了什么。）

请注意，这不会保留原始列表中用户的顺序。

英文:

Group by the names, find entries with more than one value:

Map&lt;String, List&lt;User&gt;&gt; grouped = users.stream()
    .collect(groupingBy(User::getName));

List&lt;User&gt; duplicated =
    grouped.values().stream()
        .filter(v -&gt; v.size() &gt; 1)
        .flatMap(List::stream)
        .collect(toList());

(You can do this in a single expression if you want. I only separated the steps to make it a little more clear what is happening).

Note that this does not preserve the order of the users from the original list.

答案3

得分: 1

我在@holger的帮助下找到了解决方案：

// 使用O(n)收集所有重复的名称
var duplicateNames = all.stream()
                .collect(Collectors.groupingBy(Strategy::getName, Collectors.counting()))
                .entrySet()
                .stream()
                .filter(m -> m.getValue() > 1)
                .map(m -> m.getKey())
                .collect(Collectors.toList());

这个解决方案的性能是O(n)还是O(n^2)？

如果有人能找到改进方法，请分享。

英文:

I find the solution with the help of @holger:

// collect all duplicate names with O(n)
var duplicateNames = all.stream()
				.collect(Collectors.groupingBy(Strategy::getName, Collectors.counting()))
				.entrySet()
				.stream()
				.filter(m -&gt; m.getValue() &gt; 1)
				.map(m -&gt; m.getKey())
				.collect(Collectors.toList());

Is the performance of this solution O(n^2) or O(n)?

If someone can find improvements then please share.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

简化Java Stream以查找重复属性。

问题

答案1

答案2

答案3

Remove Java中的公钥头和尾部。

POST InputStream RestTemplate

在 MacOS Big Sur 上安装 Netbeans 8.2 时未找到 JDK。

在Android中的Activity到Fragment的通信

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论