英文:
Sorting and Grouping on a list of objects
问题
以下是翻译好的内容:
我有一个如下所示的Procedure对象列表:
Procedure1	01/01/2020
Procedure2	03/01/2020
Procedure3	03/01/2020
Procedure1	04/01/2020
Procedure5	05/01/2020, 02/01/2020
Procedure2	06/01/2020
而我的Procedure类如下:
类 Procedure {
	List<Date> procedureDate;
	String procedureName;
}
我想根据以下条件对对象进行排序和分组。
1)所有程序应基于过程名称分组。
2)流程必须按流程日期降序排列。[日期列表中的第一个元素,即procedureDate.get[0]]
3)分组在相同的流程中,应按日期降序排列。
最终结果应为:
Procedure2	06/01/2020
Procedure2	03/01/2020
Procedure5	05/01/2020, 02/01/2020
Procedure1	04/01/2020
Procedure1	01/01/2020
Procedure3	03/01/2020
我能够使用Comparator和旧的Java代码实现这一点。是否可以使用Java 8的流、收集器和分组来实现相同的效果?
英文:
I have a List of Procedure objects as below
Procedure1	01/01/2020
Procedure2	03/01/2020
Procedure3	03/01/2020
Procedure1	04/01/2020
Procedure5	05/01/2020, 02/01/2020
Procedure2	06/01/2020
and my Procedure class is like
Class Procedure {
	List<Date> procedureDate;
	String procedureName;
}
I want to sort and group the objects based on the below conditions.
- All procedures should be grouped based on the procedure name.
 - Procedures must be in descending order of procedure date. [first element in date list i.e., 
procedureDate.get[0]] - Same Procedures grouped together should be in descending order of Date.
 
End result must be,
Procedure2	06/01/2020
Procedure2	03/01/2020
Procedure5	05/01/2020, 02/01/2020
Procedure1	04/01/2020
Procedure1	01/01/2020
Procedure3	03/01/2020
I was able to achieve this using Comparator and old java code. Is it possible to achieve the same using java8 streams, collectors and grouping by?
答案1
得分: 2
这是一个非常有趣的问题。解决方案并不像看起来的那么简单。你必须将解决方案分成多个步骤:
- 基于
List<Date>中的第一个日期,获取每个分组的procedureName的最大值。 - 基于步骤一中创建的
Map<String, Date中的最大Date值,比较Procedure实例。 - 如果它们相等,则通过名称进行区分(例如,两次
Procedure 2)。 - 如果它们仍然相等,则根据它们实际的第一个日期对
Procedure实例进行排序。 
以下是演示链接:https://www.jdoodle.com/iembed/v0/Te。
步骤1
List<Procedure> procedures = ...
Map<String, Date> map = procedures.stream().collect(
    Collectors.collectingAndThen(
        Collectors.groupingBy(
            Procedure::getProcedureName,
            Collectors.maxBy(Comparator.comparing(s -> s.getProcedureDate().get(0)))),
    s -> s.entrySet().stream()
        .filter(e -> e.getValue().isPresent())
        .collect(Collectors.toMap(
              Map.Entry::getKey,
              e -> e.getValue().get().getProcedureDate().get(0)))));
.. 解释:有一种简单的方法可以获得按procedureName分组的具有最大第一个日期的Procedure。
Map<String, Optional<Procedure>> mapOfOptionalProcedures = procedures.stream()
    .collect(Collectors.groupingBy(
             Procedure::getProcedureName,
             Collectors.maxBy(Comparator.comparing(o -> o.getProcedureDate().get(0)))));
然而,返回的结构有点繁琐(Map<String, Optional<Procedure>>),为了让它更有用并直接返回Date,需要使用附加的下游收集器Collectors::collectingAndThen,它使用一个Function作为结果映射器:
Map<String, Date> map = procedures.stream().collect(
    Collectors.collectingAndThen(
        /* 分组部分 */,
        s -> s.entrySet().stream()
            .filter(e -> e.getValue().isPresent())
            .collect(Collectors.toMap(
                    Map.Entry::getKey,
                    e -> e.getValue().get().getProcedureDate().get(0)))));
... 这实际上就是第一个片段。
步骤2、3和4
基本上,按每个分组的最大日期排序。然后按名称排序,最后按实际的第一个日期排序。
Collections.sort(
    procedures,
    (l, r) -> {
        int dates = map.get(r.getProcedureName()).compareTo(map.get(l.getProcedureName()));
        if (dates == 0) {
             int names =  l.getProcedureName().compareTo(r.getProcedureName());
             if (names == 0) {
                 return r.getProcedureDate().get(0).compareTo(l.getProcedureDate().get(0));
             } else return names;
        } else return dates;
    }
);
排序后的结果
根据你的问题使用已弃用的java.util.Date,排序后的procedures将会有类似于你期望的片段的排序项(我已经重写了Procedure::toString方法):
@Override
public String toString() {
     return procedureName + " " + procedureDate;
}
Procedure2 [Mon Jan 06 00:00:00 CET 2020]
Procedure2 [Fri Jan 03 00:00:00 CET 2020]
Procedure5 [Sun Jan 05 00:00:00 CET 2020, Thu Jan 02 00:00:00 CET 2020]
Procedure1 [Sat Jan 04 00:00:00 CET 2020]
Procedure1 [Wed Jan 01 00:00:00 CET 2020]
Procedure3 [Fri Jan 03 00:00:00 CET 2020]
英文:
This is a very interesting question. The solution is not as easy as it looks to be. You have to divide the solution into multiple steps:
- Get the max value for each grouped 
procedureNamebased on the first dates in theList<Date>. - Compare the 
Procedureinstances based on maxDatevalue from theMap<String, Datecreated in the step one. - If they are equal distinguish them by the name (ex. two times 
Procedure 2). - If they are still equal, sort the 
Procedureinstances based on their actual first date. 
Here is the demo at: https://www.jdoodle.com/iembed/v0/Te.
Step 1
List<Procedure> procedures = ...
Map<String, Date> map = procedures.stream().collect(
    Collectors.collectingAndThen(
        Collectors.groupingBy(
            Procedure::getProcedureName,
            Collectors.maxBy(Comparator.comparing(s -> s.getProcedureDate().get(0)))),
    s -> s.entrySet().stream()
        .filter(e -> e.getValue().isPresent())
        .collect(Collectors.toMap(
              Map.Entry::getKey,
              e -> e.getValue().get().getProcedureDate().get(0)))));
.. explained: There is a simple way to get a Procedure with maximum first date grouped by procedureName.
Map<String, Optional<Procedure>> mapOfOptionalProcedures = procedures.stream()
    .collect(Collectors.groupingBy(
             Procedure::getProcedureName,
             Collectors.maxBy(Comparator.comparing(o -> o.getProcedureDate().get(0)))));
However, the returned structure is a bit clumsy (Map<String, Optional<Procedure>>), to make it useful and return Date directly, there is a need of additional downstream collector Collectors::collectingAndThen which uses a Function as a result mapper:
Map<String, Date> map = procedures.stream().collect(
    Collectors.collectingAndThen(
        /* grouping part */,
        s -> s.entrySet().stream()
            .filter(e -> e.getValue().isPresent())
            .collect(Collectors.toMap(
                    Map.Entry::getKey,
                    e -> e.getValue().get().getProcedureDate().get(0)))));
... which is effectively the first snippet.
Steps 2, 3 and 4
Basically, sort by the maximum date for each group. Then sort by the name and finally by the actual first date.
Collections.sort(
    procedures,
    (l, r) -> {
        int dates = map.get(r.getProcedureName()).compareTo(map.get(l.getProcedureName()));
        if (dates == 0) {
             int names =  l.getProcedureName().compareTo(r.getProcedureName());
             if (names == 0) {
                 return r.getProcedureDate().get(0).compareTo(l.getProcedureDate().get(0));
             } else return names;
        } else return dates;
    }
);
Sorted result
Using the deprecated java.util.Date according to your question, the sorted procedures will have sorted items like your expected snippet (I have overrided the Procedure::toString method)
@Override
public String toString() {
     return procedureName + " " + procedureDate;
}
Procedure2 [Mon Jan 06 00:00:00 CET 2020]
Procedure2 [Fri Jan 03 00:00:00 CET 2020]
Procedure5 [Sun Jan 05 00:00:00 CET 2020, Thu Jan 02 00:00:00 CET 2020]
Procedure1 [Sat Jan 04 00:00:00 CET 2020]
Procedure1 [Wed Jan 01 00:00:00 CET 2020]
Procedure3 [Fri Jan 03 00:00:00 CET 2020]
答案2
得分: 1
我的想法源自函数式编程,其基础是映射-归约(map-reduce)。你可以看到groupBy/collect实际上是一种归约的形式,而这个问题可以通过"合并"来更好地解决,而不是使用Stream的groupBy功能。以下是我在纯Stream中的实现。
List<Procedure> a = List.of(
    new Procedure(...),
    ...
)
List<Procedure> b = a.stream().map((p) -> {                    // 为每个对象创建一个映射以准备归约
    Map<String, Procedure> mapP = new HashMap<>();
    mapP.put(p.getProcedureName(), p);
    return mapP;
}).reduce((p, q) -> {                                         // 使用归约进行合并
    q.entrySet().stream().forEach((qq) -> {
        if (p.containsKey(qq.getKey())) {
            p.get(qq.getKey()).setProcedureDate(
                new ArrayList<>(
                    Stream.concat(
                        p.get(qq.getKey()).getProcedureDate().stream(),
                        qq.getValue().getProcedureDate().stream())
                    .collect(Collectors.toSet())
                )
            );
        } else {
            p.put(qq.getKey(), qq.getValue());
        }
    });
    return p;
}).get().values().stream().map(p -> {                          // 对象内部的日期排序
    p.setProcedureDate(p.getProcedureDate().stream().sorted().collect(Collectors.toList()));
    return p;
}).sorted((x, y) ->                                         // 按第一个日期对对象排序
    x.getProcedureDate().get(0).compareTo(y.getProcedureDate().get(0))
).collect(Collectors.toList());
英文:
My thought is coming from functional programming which is based on map-reduce. You can see groupBy/collect is actually a form of reduce anyway and this problem can be better "merge" rather than using groupBy feature of Stream. This is my implementation in pure Stream.
List<Procedure> a = List.of(
new Procedure(...),
...
)
List<Procedure> b = a.stream().map((p)-> {                    // Prepare for reduce by create Map for each object
Map<String,Procedure> mapP = new HashMap<>();
mapP.put(p.getProcedureName(),p)
return mapP
}).reduce((p,q)->{                                         //Use reduce to merge
q.entrySet().stream().forEach((qq)-> {
if (p.containsKey(qq.getKey())) {
p.get(qq.getKey()).setProcedureDate(
new ArrayList<Date>(
Stream.concat(
p.get(qq.getKey()).getProcedureDate().stream(),
qq.getValue().getProcedureDate().stream())
.collect(Collectors.toSet()))
);
} else {
p.put(qq.getKey(), qq.getValue());
}
})
return p;
}).get().values().stream().map(p-> {                          //sort date inside object
p.setProcedureDate(p.getProcedureDate().stream().sorted().collect(Collectors.toList()))
return p;
}
).sorted((x,y)->                                         //sort object by the first date
x.procedureDate.get(0).compareTo(y.procedureDate.get(0))
).collect(Collectors.toList());
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论