英文:
When and how to perform one to 0..n mapping Stream mapMulti over flatMap
问题
我一直在浏览新闻和最新的LTE Java 17版本的源代码,遇到了一个名为mapMulti
的新Stream方法。早期访问的JavaDoc表示它类似于flatMap
。
<R> Stream<R> mapMulti(BiConsumer<? super T,? super Consumer<R>> mapper)
- 如何使用这个方法执行从一个元素到 0 到 n 个元素的映射?
- 这个新方法是如何工作的,它与
flatMap
有什么不同?在什么情况下使用每种方法更合适? mapper
可以被调用多少次?
英文:
I have been skimming through the news and the source code of the newest LTE Java 17 version and I have encountered with new Stream method called mapMulti
. The early-access JavaDoc says it is similar to flatMap
.
<R> Stream<R> mapMulti(BiConsumer<? super T,? super Consumer<R>> mapper)
- How to perform one to 0..n mapping using this method?
- How does the new method work and how does it differ from
flatMap
. When is each one preferable? - How many times the
mapper
can be called?
答案1
得分: 41
Stream::mapMulti
是一种被归类为中间操作的新方法。
它需要一个将要被处理的元素的BiConsumer<T, Consumer<R>> mapper
,后面紧跟着一个Consumer
。乍一看,这使得该方法在第一眼看起来很奇怪,因为它与我们在其他中间方法(如map
、filter
或peek
)中习惯的不同,后者都没有使用*Consumer
的变种。
API本身通过lambda表达式中提供的Consumer
的目的是接受任意数量的元素,以便在后续的流水线中可用。因此,所有的元素,无论有多少个,都将被传播。
使用简单片段解释
-
一对多(0..1)映射(类似于
filter
)仅使用
consumer.accept(R r)
对少数选定的项目进行操作可以实现类似于过滤的流水线。在检查元素是否满足谓词条件并且将其映射到不同值的情况下,这可能会变得很有用,否则可能需要使用filter
和map
的组合来实现。以下示例:Stream.of("Java", "Python", "JavaScript", "C#", "Ruby") .mapMulti((str, consumer) -> { if (str.length() > 4) { consumer.accept(str.length()); // 长度大于4的元素 } }) .forEach(i -> System.out.print(i + " ")); // 输出:6 10
-
一对一映射(类似于
map
)在前面的示例中工作,当条件被省略并且每个元素都被映射到一个新元素并使用
consumer
接受时,该方法实际上就像map
一样:Stream.of("Java", "Python", "JavaScript", "C#", "Ruby") .mapMulti((str, consumer) -> consumer.accept(str.length())) .forEach(i -> System.out.print(i + " ")); // 输出:4 6 10 2 4
-
一对多映射(类似于
flatMap
)在这种情况下,事情变得有趣,因为可以随意调用
consumer.accept(R r)
多次。假设我们想要将表示字符串长度的数字自身复制,即2
变为2
、2
,4
变为4
、4
、4
、4
,0
不变。Stream.of("Java", "Python", "JavaScript", "C#", "Ruby", "") .mapMulti((str, consumer) -> { for (int i = 0; i < str.length(); i++) { consumer.accept(str.length()); } }) .forEach(i -> System.out.print(i + " ")); // 输出:4 4 4 4 6 6 6 6 6 6 10 10 10 10 10 10 10 10 10 10 2 2 4 4 4 4
与flatMap的比较
这种机制的核心思想是可以多次(包括零次)调用它,并且它内部使用SpinedBuffer
允许将元素推送到单个扁平化的流实例中,而不是为每组输出元素创建新的流,这与flatMap
所要求的不同。Java文档指出了两种情况下使用此方法优于flatMap
的情况:
- 当将每个流元素替换为少量(甚至可能是零个)元素时。使用此方法避免了为每组结果元素创建新的流实例的开销,这是
flatMap
所要求的。- 当使用命令式方法生成结果元素比以Stream的形式返回它们更容易时。
就性能而言,在这些情况下,新的mapMulti
方法是更优的。请查看本答案底部的基准测试。
过滤映射场景
使用此方法代替单独的filter
或map
并不合理,因为它会变得冗长,而且无论如何都会创建一个中间流。一个例外是替换连续调用.filter(..).map(..)
链,这在检查元素类型及其转换的情况下非常方便。
int sum = Stream.of(1, 2.0, 3.0, 4F, 5, 6L)
.mapMultiToInt((number, consumer) -> {
if (number instanceof Integer) {
consumer.accept((Integer) number);
}
})
.sum();
// 输出:6
如上所示,它的变种,如mapMultiToDouble
、mapMultiToInt
和mapMultiToLong
已被引入。这些是原始流(primitive Streams)中的mapMulti
方法的变种,例如IntStream mapMulti(IntStream.IntMapMultiConsumer mapper)
。另外,还引入了三个新的函数接口。基本上,它们是BiConsumer<T, Consumer<R>>
的原始变种,例如:
@FunctionalInterface
interface IntMapMultiConsumer {
void accept(int value, IntConsumer ic);
}
综合的真实用例场景
这种方法的真正优势在于其使用的灵活性和一次只创建一个流,这是相对于flatMap
的主要优势。下面的两个片段表示了Product
及其List<Variation>
的扁平映射成基于某些条件的0..n
个Offer
类的情况,该情况基于产品类别和变体可用性。
Product
包
英文:
Stream::mapMulti
is a new method that is classified as an intermediate operation.
It requires a BiConsumer<T, Consumer<R>> mapper
of the element about to be processed a Consumer
. The latter makes the method look strange at the first glance because it is different from what we are used to at the other intermediate methods such as map
, filter
, or peek
where none of them use any variation of *Consumer
.
The purpose of the Consumer
provided right within the lambda expression by the API itself is to accept any number elements to be available in the subsequent pipeline. Therefore, all the elements, regardless of how many, will be propagated.
Explanation using simple snippets
-
One to some (0..1) mapping (similar to
filter
)Using the
consumer.accept(R r)
for only a few selected items achieves filter-alike pipeline. This might get useful in case of checking the element against a predicate and it's mapping to a different value, which would be otherwise done using a combination offilter
andmap
instead. The followingStream.of("Java", "Python", "JavaScript", "C#", "Ruby") .mapMulti((str, consumer) -> { if (str.length() > 4) { consumer.accept(str.length()); // lengths larger than 4 } }) .forEach(i -> System.out.print(i + " ")); // 6 10
-
One to one mapping (similar to
map
)Working with the previous example, when the condition is omitted and every element is mapped into a new one and accepted using the
consumer
, the method effectively behaves likemap
:Stream.of("Java", "Python", "JavaScript", "C#", "Ruby") .mapMulti((str, consumer) -> consumer.accept(str.length())) .forEach(i -> System.out.print(i + " ")); // 4 6 10 2 4
-
One to many mapping (similar to
flatMap
)Here things get interesting because one can call
consumer.accept(R r)
any number of times. Let's say we want to replicate the number representing the String length by itself, i.e.2
becomes2
,2
.4
becomes4
,4
,4
,4
. and0
becomes nothing.Stream.of("Java", "Python", "JavaScript", "C#", "Ruby", "") .mapMulti((str, consumer) -> { for (int i = 0; i < str.length(); i++) { consumer.accept(str.length()); } }) .forEach(i -> System.out.print(i + " ")); // 4 4 4 4 6 6 6 6 6 6 10 10 10 10 10 10 10 10 10 10 2 2 4 4 4 4
Comparison with flatMap
The very idea of this mechanism is that is can be called multiple times (including zero) and its usage of SpinedBuffer
internally allows to push the elements into a single flattened Stream instance without creating a new one for every group of output elements unlike flatMap
. The JavaDoc states two use-cases when using this method is preferable over flatMap
:
> - When replacing each stream element with a small (possibly zero) number of elements. Using this method avoids the overhead of creating a new Stream instance for every group of result elements, as required by flatMap.
> - When it is easier to use an imperative approach for generating result elements than it is to return them in the form of a Stream.
Performance-wise, the new method mapMulti
is a winner in such cases. Check out the benchmark at the bottom of this answer.
Filter-map scenario
Using this method instead of filter
or map
separately doesn't make sense due to its verbosity and the fact one intermediate stream is created anyway. The exception might be replacing the .filter(..).map(..)
chain called together, which comes handy in the case such as checking the element type and its casting.
int sum = Stream.of(1, 2.0, 3.0, 4F, 5, 6L)
.mapMultiToInt((number, consumer) -> {
if (number instanceof Integer) {
consumer.accept((Integer) number);
}
})
.sum();
// 6
int sum = Stream.of(1, 2.0, 3.0, 4F, 5, 6L)
.filter(number -> number instanceof Integer)
.mapToInt(number -> (Integer) number)
.sum();
As seen above, its variations like mapMultiToDouble
, mapMultiToInt
and mapMultiToLong
were introduced. This comes along the mapMulti
methods within the primitive Streams such as IntStream mapMulti(IntStream.IntMapMultiConsumer mapper)
. Also, three new functional interfaces were introduced. Basically, they are the primitive variations of BiConsumer<T, Consumer<R>>
, example:
@FunctionalInterface
interface IntMapMultiConsumer {
void accept(int value, IntConsumer ic);
}
Combined real use-case scenario
The real power of this method is in its flexibility of usage and creating only one Stream at a time, which is the major advantage over flatMap
. The two below snippets represent a flatmapping of Product
and its List<Variation>
into 0..n
offers represented by the Offer
class and based on certain conditions (product category and the variation availability).
Product
withString name
,int basePrice
,String category
andList<Variation> variations
.Variation
withString name
,int price
andboolean availability
.
List<Product> products = ...
List<Offer> offers = products.stream()
.mapMulti((product, consumer) -> {
if ("PRODUCT_CATEGORY".equals(product.getCategory())) {
for (Variation v : product.getVariations()) {
if (v.isAvailable()) {
Offer offer = new Offer(
product.getName() + "_" + v.getName(),
product.getBasePrice() + v.getPrice());
consumer.accept(offer);
}
}
}
})
.collect(Collectors.toList());
List<Product> products = ...
List<Offer> offers = products.stream()
.filter(product -> "PRODUCT_CATEGORY".equals(product.getCategory()))
.flatMap(product -> product.getVariations().stream()
.filter(Variation::isAvailable)
.map(v -> new Offer(
product.getName() + "_" + v.getName(),
product.getBasePrice() + v.getPrice()
))
)
.collect(Collectors.toList());
The use of mapMulti
is more imperatively inclined compared to the declarative approach of the previous-versions Stream methods combination seen in the latter snippet using flatMap
, map
, and filter
. From this perspective, it depends on the use-case whether is easier to use an imperative approach. Recursion is a good example described in the JavaDoc.
Benchmark
As promised, I have wrote a bunch of micro-benchmarks from ideas collected from the comments. As long as there is quite a lot of code to publish, I have created a GitHub repository with the implementation details and I am about to share the results only.
Stream::flatMap(Function)
vs Stream::mapMulti(BiConsumer)
Source
Here we can see the huge difference and a proof the newer method actually works as described and its usage avoid the overhead of creating a new Stream instance with each processed element.
Benchmark Mode Cnt Score Error Units
MapMulti_FlatMap.flatMap avgt 25 73.852 ± 3.433 ns/op
MapMulti_FlatMap.mapMulti avgt 25 17.495 ± 0.476 ns/op
Stream::filter(Predicate).map(Function)
vs Stream::mapMulti(BiConsumer)
Source
Using chained pipelines (not nested, though) is fine.
Benchmark Mode Cnt Score Error Units
MapMulti_FilterMap.filterMap avgt 25 7.973 ± 0.378 ns/op
MapMulti_FilterMap.mapMulti avgt 25 7.765 ± 0.633 ns/op
Stream::flatMap(Function)
with Optional::stream()
vs Stream::mapMulti(BiConsumer)
Source
This one is very interesting, escpecially in terms of usage (see the source code): we are now able to flatten using mapMulti(Optional::ifPresent)
and as expected, the new method is a bit faster in this case.
Benchmark Mode Cnt Score Error Units
MapMulti_FlatMap_Optional.flatMap avgt 25 20.186 ± 1.305 ns/op
MapMulti_FlatMap_Optional.mapMulti avgt 25 10.498 ± 0.403 ns/op
答案2
得分: 11
为了应对这种情况
> 当使用命令式方法生成结果元素比以Stream的形式返回它们更容易时。
我们可以将其视为现在具有的有限变体在C#中的yield语句。限制在于我们总是需要来自流的初始输入,因为这是一个中间操作,此外,对于我们在一个函数评估中推送的元素,没有短路功能。
不过,它还是开启了一些有趣的机会。
例如,以前实现斐波那契数列的流需要使用能够保存两个值的临时对象的解决方案。
现在,我们可以使用类似这样的代码:
IntStream.of(0)
.mapMulti((a,c) -> {
for(int b = 1; a >=0; b = a + (a = b))
c.accept(a);
})
/* 这里可以添加其他流操作 */
.forEach(System.out::println);
它会在int
值溢出时停止,正如所说,当我们使用不消耗所有值的终端操作时,它不会进行短路,然而,这个循环生成然后被忽略的值可能仍然比其他方法更快。
另一个例子受到这个回答的启发,从根类到最具体类迭代类层次结构:
Stream.of(LinkedHashMap.class).mapMulti(MapMultiExamples::hierarchy)
/* 这里可以添加其他流操作 */
.forEach(System.out::println);
}
static void hierarchy(Class<?> cl, Consumer<? super Class<?>> co) {
if(cl != null) {
hierarchy(cl.getSuperclass(), co);
co.accept(cl);
}
}
与旧方法不同,这不需要额外的堆存储,并且可能会更快地运行(假设合理的类深度不会使递归逆火)。
还有像[这样的庞然大物]
> > List<A> list = IntStream.range(0, r_i).boxed() > .flatMap(i -> IntStream.range(0, r_j).boxed() > .flatMap(j -> IntStream.range(0, r_k) > .mapToObj(k -> new A(i, j, k)))) > .collect(Collectors.toList()); >
现在可以这样编写
List<A> list = IntStream.range(0, r_i).boxed()
.<A>mapMulti((i,c) -> {
for(int j = 0; j < r_j; j++) {
for(int k = 0; k < r_k; k++) {
c.accept(new A(i, j, k));
}
}
})
.collect(Collectors.toList());
与嵌套的flatMap
步骤相比,它失去了一些并行性机会,而参考实现无论如何都没有利用这一点。对于像上面那样的非短路操作,新方法可能会从减少装箱和减少捕获lambda表达式的实例化中受益。当然,它应该谨慎使用,不要将每个构造都重写为命令式版本(在许多人尝试将每个命令式代码重写为函数式版本之后)...
英文:
To address the scenario
> When it is easier to use an imperative approach for generating result elements than it is to return them in the form of a Stream.
We can see it as now having a limited variant of the yield statement C#. The limitations are that we always need an initial input from a stream, as this is an intermediate operation, further, there’s no short-circuiting for the elements we’re pushing in one function evaluation.
Still, it opens interesting opportunities.
E.g., implementing a stream of Fibonacci number formerly required a solution using temporary objects capable of holding two values.
Now, we can use something like:
IntStream.of(0)
.mapMulti((a,c) -> {
for(int b = 1; a >=0; b = a + (a = b))
c.accept(a);
})
/* additional stream operations here */
.forEach(System.out::println);
It stops when the int
values overflow, as said, it won’t short-circuit when we use a terminal operation that does not consume all values, however, this loop producing then-ignored values might still be faster than the other approaches.
Another example inspired by this answer, to iterate over a class hierarchy from root to most specific:
Stream.of(LinkedHashMap.class).mapMulti(MapMultiExamples::hierarchy)
/* additional stream operations here */
.forEach(System.out::println);
}
static void hierarchy(Class<?> cl, Consumer<? super Class<?>> co) {
if(cl != null) {
hierarchy(cl.getSuperclass(), co);
co.accept(cl);
}
}
which unlike the old approaches does not require additional heap storage and will likely run faster (assuming reasonable class depths that do not make recursion backfire).
Also monsters like this
>
> List<A> list = IntStream.range(0, r_i).boxed()
> .flatMap(i -> IntStream.range(0, r_j).boxed()
> .flatMap(j -> IntStream.range(0, r_k)
> .mapToObj(k -> new A(i, j, k))))
> .collect(Collectors.toList());
>
can now be written like
List<A> list = IntStream.range(0, r_i).boxed()
.<A>mapMulti((i,c) -> {
for(int j = 0; j < r_j; j++) {
for(int k = 0; k < r_k; k++) {
c.accept(new A(i, j, k));
}
}
})
.collect(Collectors.toList());
Compared to the nested flatMap
steps, it loses some parallelism opportunity, which the reference implementation didn’t exploit anyway. For a non-short-circuiting operation like above, the new method likely will benefit from the reduced boxing and less instantiation of capturing lambda expressions. But of course, it should be used judiciously, not to rewrite every construct to an imperative version (after so many people tried to rewrite every imperative code into a functional version)…
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论