英文:
Run a For-each loop on a Filtered HashMap
问题
我对Java非常陌生。这是我的问题。
我有一个类型为 `Map<Integer, List<MyObject>>` 的Map,我称其为 `myMap`。
由于 `myMap` 有很多成员(约100,000个),我认为使用 `for` 循环可能不是一个好主意,所以我想要对 `myMap` 进行 `filter`,条件如下:
`myMap.get(i).get(每个成员).MyObject的一个特殊属性 == null`;
其中 `每个成员` 意味着我想要删除 `myMap` 的成员,这些成员在该属性(我们称其为 `myAttribute` 更方便)上的整个列表的成员(所有对象)都为null。
我一个未完成的想法是这样的:
```java
Map<Integer, List<toHandle>> collect = myMap.entrySet().stream()
.filter(x -> x.getValue().stream().anyMatch(y -> y.get特殊属性() == null))
.collect(Collectors.toMap(x -> x.getKey(), x -> x.getValue()));
任何帮助将不胜感激。谢谢。
<details>
<summary>英文:</summary>
I am so new to java. and there is my problem.
I have a Map in Type of `Map<Integer , List<MyObject>>` that I call it `myMap`.
As `myMap` has a lot of members (About 100000) , I don't think the `for` loop to be such a good idea so I wanna `filter` my `Map<Integer , List<MyObject>>` Where the bellow condition happens:
`myMap.get(i).get(every_one_of_them).a_special_attribute_of_my_MyObject == null`;
in which `every_one_of_them` means i wanna to delete members of `myMap` which the Whole list's members(All of its Objects) are null in that attribute(for more comfort , let's call it `myAttribute`).
one of my uncompleted idea was such a thing:
Map<Integer, List<toHandle>> collect = myMap.entrySet().stream()
.filter(x -> x.getValue.HERE_IS_WHERE_I_DO_NOT_KNOW_HOW_TO)
.collect(Collectors.toMap(x -> x.getKey(), x -> x.getValue()));
Any Help Will Be Highly Appreciated. Thanks.
</details>
# 答案1
**得分**: 3
你可以:
- 遍历映射的 [`values()`][1] 并从中删除你不想要的元素。你可以使用 `removeIf(Predicate condition)` 来实现这一点。
- 要检查列表中的所有元素是否满足某个条件,你可以使用 `list.stream().allMatch(Predicate condition)`。
例如,我们有一个 `Map<Integer, List<String>>`,我们想要删除那些所有字符串都以 `b` 或 `B` 开头的列表。你可以通过以下方式实现:
```java
myMap.values()
.removeIf(list -> list.stream()
.allMatch(str -> str.toLowerCase().startsWith("b"))
// 但在实际应用中,为了更好的性能,可以使用
// .allMatch(str -> str.regionMatches(true, 0, "b", 0, 1))
);
演示:
Map<Integer, List<String>> myMap = new HashMap<>(Map.of(
1, List.of("Abc", "Ab"),
2, List.of("Bb", "Bc"),
3, List.of("Cc")
));
myMap.values()
.removeIf(list -> list.stream()
.allMatch(str -> str.toLowerCase().startsWith("b"))
);
System.out.println(myMap);
输出:
{1=[Abc, Ab], 3=[Cc]}
英文:
You can
- iterate over map
values()
and remove from it elements which you don't want. You can use for thatremoveIf(Predicate condition)
. - To check if all elements in list fulfill some condition you can use
list.stream().allMatch(Predicate condition)
For instance lets we have Map<Integer, List<String>>
and we want to remove lists which have all strings starting with b
or B
. You can do it via
myMap.values()
.removeIf(list -> list.stream()
.allMatch(str -> str.toLowerCase().startsWith("b"))
// but in real application for better performance use
// .allMatch(str -> str.regionMatches(true, 0, "b", 0, 1))
);
DEMO:
Map<Integer , List<String>> myMap = new HashMap<>(Map.of(
1, List.of("Abc", "Ab"),
2, List.of("Bb", "Bc"),
3, List.of("Cc")
));
myMap.values()
.removeIf(list -> list.stream()
.allMatch(str -> str.toLowerCase().startsWith("b"))
);
System.out.println(myMap);
Output:
{1=[Abc, Ab], 3=[Cc]}
答案2
得分: 2
因为 myMap 有许多成员(约 100000),我认为使用 for 循环不是一个好主意,所以我想要过滤
这听起来好像你认为 stream.filter 比 foreach 更快。事实并非如此;它要么更慢,要么速度差不多。
剧透:最后我进行了一些基本的性能测试,但我邀请任何人都可以参与那个测试,并将其升级为一个完整的 JMH 测试套件,并在各种硬件上运行它。然而 - 根据测试结果,你实际上是错的,foreach 比任何涉及流的操作要快得多。
另外,你好像觉得 100000 条记录是很多的。实际上并不是。使用 foreach 循环(或者更确切地说,使用迭代器)会更快。使用迭代器进行移除操作会更快。
并行处理可以在这里帮助你,而且使用流更简单,但你不能只是在代码中添加一个 parallel()
,然后相信它会奏效。它取决于底层的类型。例如,你的普通 j.u.HashMap 在这方面表现不太好;像是 ConcurrentHashMap 这样的东西更加强大。但是如果你花时间将所有数据复制到更适合的映射类型中,在这段时间内你可能已经完成了整个任务,而且可能更快!(取决于这些列表有多大)。
步骤 1:创建一个 oracle 函数
但是,首先,我们需要一个 oracle 函数:用于确定是否应该删除给定的条目。无论你选择什么解决方案,这都是必需的:
public boolean keep(List<MyObject> mo) {
for (MyObject obj : mo) if (obj.specialProperty != null) return true;
return false;
}
你可以将其改写为流式形式:
public boolean keep(List<MyObject> mo) {
return mo.stream().anyMatch(o -> o.specialProperty != null);
}
步骤 2:过滤列表
有了这个函数,任务就变得更容易了:
var it = map.values().iterator();
while (it.hasNext()) if (!keep(it.next())) it.remove();
现在你所需要的就是这个。我们也可以将其改写为使用流,但是请注意,你不能使用流来直接修改映射,而且复制操作通常会比较慢,因此这可能会更慢,而且肯定会消耗更多的内存:
Map<Integer, List<MyObject>> result =
map.entrySet().stream()
.filter(e -> keep(e.getValue()))
.collect(Collectors.toMap(e -> e.getKey(), e -> e.getValue()));
此外,请注意流选项通常不会导致代码显著变得更短。不要基于流本质上更好,或者会导致代码更可读的观念来做出流与非流的决策。很遗憾,编程并不是那么简单。
我们还可以在 map 本身上使用一些更加函数式的方法:
map.values().removeIf(v -> !keep(v));
这似乎是最佳的选择,虽然我们不得不通过 values()
来进行 "bounce";map 本身没有 removeIf
方法,但由 keySet、values、entrySet 等返回的集合会将任何更改反映回映射中,因此这是可以工作的。
来进行性能测试吧!
性能测试很棘手,为了获得良好的结果,实际上需要使用 JMH。但是,让我们进行一次快速的扫描:
import java.util.*;
import java.util.stream.*;
public class Test {
// ...(此处省略部分代码)
}
这个测试,在我的硬件上是可靠的,显示了流的方式远远比其他方式更慢,而顺序处理选项在某些百分比上略胜于 removeIf 变种。这恰好表明,你最初的观点(如果我可以将其理解为 "我认为 foreach 太慢")完全是错误的,幸好如此。
为了好玩,我将 map 替换为 ConcurrentHashMap
,并使流使用 parallel()
。这没有显著改变时间,但实际上我也没有期望它会改变。
关于风格的一点说明
在各种代码片段中,我省略了循环和 if 语句的大括号。如果你添加了它们,非流式的代码将占据更多的行数,如果还包括循环和 if 结构内部的缩进空格,那么它的 "表面积" 会更大。然而,以此为线索是荒谬的 - 这相当于说:"实际上,Java 的常用样式指南非常晦涩和考虑不周。然而,我不敢打破它们。幸运的是,Lambda 出现了,给了我一个借口,可以将这些样式指南的原则扔到窗外,现在可以将所有东西堆叠成一个单一的、没有大括号的行,哦,看起来,Lambda 会导致代码更短!"。我会假设任何阅读者,在掌握这些知识的情况下,都能轻松识破这种蹩脚的论点。添加大括号主要是为了更容易的调试断点和在给定 "代码节点" 中添加额外操作的简便方式,而这些
英文:
> As myMap has a lot of members (About 100000) , I don't think the for loop to be such a good idea so I wanna filter
That sounds like you think stream.filter is somehow faster than foreach. It's not; it's either slower or about as fast.
SPOILER: All the way at the end I do some basic performance tests, but I invite anyone to take that test and upgrade it to a full JMH test suite and run it on a variety of hardware. However - it says you're in fact exactly wrong, and foreach is considerably faster than anything involving streams.
Also, it sounds like you feel 100000 is a lot of entries. It mostly isn't. a foreach loop (or rather, an iterator) will be faster. Removing with the iterator will be considerably faster.
parallelism can help you out here, and is simpler with streams, but you can't just slap a parallel()
in there and trust that it'll just work out. It depends on the underlying types. For example, your plain jane j.u.HashMap isn't very good at this; Something like a ConcurrentHashMap is far more capable. But if you take the time to copy over all data to a more suitable map type, well, in that timespan you could have done the entire job, and probably faster to boot! (Depends on how large those lists are).
Step 1: Make an oracle
But, first things first, we need an oracle function: One that determines if a given entry ought to be deleted. No matter what solution you go with, this is required:
public boolean keep(List<MyObject> mo) {
for (MyObject obj : mo) if (obj.specialProperty != null) return true;
return false;
}
you could 'streamify' it:
public boolean keep(List<MyObject> mo) {
return mo.stream().anyMatch(o -> o.specialProperty != null);
}
Step 2: Filter the list
Once we have that, the task becomes easier:
var it = map.values().iterator();
while (it.hasNext()) if (!keep(it.next())) it.remove();
is now all you need. We can streamify that if you prefer, but note that you can't use streams to change a map 'in place', and copying over is usually considerably slower, so, this is likely slower and certainly takes more memory:
Map<Integer, List<MyObject>> result =
map.entrySet().stream()
.filter(e -> keep(e.getValue()))
.collect(Collectors.toMap(e -> e.getKey(), e -> e.getValue()));
Note also how the stream option doesn't generally result in significantly shorter code either. Don't make the decision between stream or non-stream based on notions that streams are inherently better, or lead to more readable code. Programming just isn't that simple, I'm afraid.
We can also use some of the more functional methods in map itself:
map.values().removeIf(v -> !keep(v));
That seems like the clear winner, here, although it's a bit bizarre we have to 'bounce' through values()
; map itself has no removeIf
method, but the collections returned by keySet, values, entrySet etc reflect any changes back to the map, so that works out.
Let's performance test!
Performance testing is tricky and really requires using JMH for good results. By all means, as an exercise, do just that. But, let's just do a real quick scan:
import java.util.*;
import java.util.stream.*;
public class Test {
static class MyObj {
String foo;
}
public static MyObj hit() {
MyObj o = new MyObj();
o.foo = "";
return o;
}
public static MyObj miss() {
return new MyObj();
}
private static final int MAP_ELEMS = 100000;
private static final int LIST_ELEMS = 50;
private static final double HIT_OR_MISS = 0.01;
private static final Random rnd = new Random();
public static void main(String[] args) {
var map = construct();
long now = System.currentTimeMillis();
filter_seq(map);
long delta = System.currentTimeMillis() - now;
System.out.printf("Sequential: %.3f\n", 0.001 * delta);
map = construct();
now = System.currentTimeMillis();
filter_stream(map);
delta = System.currentTimeMillis() - now;
System.out.printf("Stream: %.3f\n", 0.001 * delta);
map = construct();
now = System.currentTimeMillis();
filter_removeIf(map);
delta = System.currentTimeMillis() - now;
System.out.printf("RemoveIf: %.3f\n", 0.001 * delta);
}
private static Map<Integer, List<MyObj>> construct() {
var m = new HashMap<Integer, List<MyObj>>();
for (int i = 0; i < MAP_ELEMS; i++) {
var list = new ArrayList<MyObj>();
for (int j = 0; j < LIST_ELEMS; j++) {
list.add(rnd.nextDouble() < HIT_OR_MISS ? hit() : miss());
}
m.put(i, list);
}
return m;
}
static boolean keep_seq(List<MyObj> list) {
for (MyObj o : list) if (o.foo != null) return true;
return false;
}
static boolean keep_stream(List<MyObj> list) {
return list.stream().anyMatch(o -> o.foo != null);
}
static void filter_seq(Map<Integer, List<MyObj>> map) {
var it = map.values().iterator();
while (it.hasNext()) if (!keep_seq(it.next())) it.remove();
}
static void filter_stream(Map<Integer, List<MyObj>> map) {
Map<Integer, List<MyObj>> result =
map.entrySet().stream()
.filter(e -> keep_stream(e.getValue()))
.collect(Collectors.toMap(e -> e.getKey(), e -> e.getValue()));
}
static void filter_removeIf(Map<Integer, List<MyObj>> map) {
map.values().removeIf(v -> !keep_stream(v));
}
}
This, reliably, on my hardware anyway, shows that the stream route is by far the slowest, and the sequential option wins out with some percent from the removeIf variant. Which just goes to show that your initial line (if I can take that as 'I think foreach is too slow') was entirely off the mark, fortunately.
For fun I replaced the map with a ConcurrentHashMap
and made the stream parallel()
. This did not change the timing significantly, and I wasn't really expecting it too.
A note about style
In various snippets, I omit braces for loops and if statements. If you add them, the non-stream-based code occupies considerably more lines, and if you include the indent whitespace for the insides of these constructs, considerably more 'surface area' of paste. However, that is a ridiculous thing to clue off of - that is tantamount to saying: "Actually, the commonly followed style guides for java are incredibly obtuse and badly considered. However, I dare not break them. Fortunately, lambdas came along and gave me an excuse to toss the entire principle of those style guides right out the window and now pile it all into a single, braceless line, and oh look, lambdas lead to shorter code!". I would assume any reader, armed with this knowledge, can easily pierce through such baloney argumentation. The reasons for those braces primarily involve easier debug breakpointing and easy ways to add additional actions to a given 'code node', and those needs are exactly as important, if not more so, if using streams. If it's okay to one-liner and go brace-free for lambdas, then surely it is okay to do the same to if
and for
bodies.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论