在何时以及如何执行一对零到多个映射的流 mapMulti 覆盖 flatMap。

huangapple go评论96阅读模式
英文:

When and how to perform one to 0..n mapping Stream mapMulti over flatMap

问题

我一直在浏览新闻和最新的LTE Java 17版本的源代码,遇到了一个名为mapMulti的新Stream方法。早期访问的JavaDoc表示它类似于flatMap

<R> Stream<R> mapMulti(BiConsumer<? super T,? super Consumer<R>> mapper)

  • 如何使用这个方法执行从一个元素到 0 到 n 个元素的映射?
  • 这个新方法是如何工作的,它与flatMap有什么不同?在什么情况下使用每种方法更合适?
  • mapper 可以被调用多少次?
英文:

I have been skimming through the news and the source code of the newest LTE Java 17 version and I have encountered with new Stream method called mapMulti. The early-access JavaDoc says it is similar to flatMap.

&lt;R&gt; Stream&lt;R&gt; mapMulti(BiConsumer&lt;? super T,? super Consumer&lt;R&gt;&gt; mapper)
  • How to perform one to 0..n mapping using this method?
  • How does the new method work and how does it differ from flatMap. When is each one preferable?
  • How many times the mapper can be called?

答案1

得分: 41

Stream::mapMulti是一种被归类为中间操作的新方法。

它需要一个将要被处理的元素的BiConsumer<T, Consumer<R>> mapper,后面紧跟着一个Consumer。乍一看,这使得该方法在第一眼看起来很奇怪,因为它与我们在其他中间方法(如mapfilterpeek)中习惯的不同,后者都没有使用*Consumer的变种。

API本身通过lambda表达式中提供的Consumer的目的是接受任意数量的元素,以便在后续的流水线中可用。因此,所有的元素,无论有多少个,都将被传播。

使用简单片段解释

  • 一对多(0..1)映射(类似于filter

    仅使用consumer.accept(R r)对少数选定的项目进行操作可以实现类似于过滤的流水线。在检查元素是否满足谓词条件并且将其映射到不同值的情况下,这可能会变得很有用,否则可能需要使用filtermap的组合来实现。以下示例:

    Stream.of("Java", "Python", "JavaScript", "C#", "Ruby")
          .mapMulti((str, consumer) -> {
              if (str.length() > 4) {
                  consumer.accept(str.length());  // 长度大于4的元素
              }
          })
          .forEach(i -> System.out.print(i + " "));
    
    // 输出:6 10
    
  • 一对一映射(类似于map

    在前面的示例中工作,当条件被省略并且每个元素都被映射到一个新元素并使用consumer接受时,该方法实际上就像map一样:

    Stream.of("Java", "Python", "JavaScript", "C#", "Ruby")
          .mapMulti((str, consumer) -> consumer.accept(str.length()))
          .forEach(i -> System.out.print(i + " "));
    
    // 输出:4 6 10 2 4
    
  • 一对多映射(类似于flatMap

    在这种情况下,事情变得有趣,因为可以随意调用consumer.accept(R r)多次。假设我们想要将表示字符串长度的数字自身复制,即2变为224变为44440不变。

    Stream.of("Java", "Python", "JavaScript", "C#", "Ruby", "")
          .mapMulti((str, consumer) -> {
              for (int i = 0; i < str.length(); i++) {
                  consumer.accept(str.length());
              }
          })
          .forEach(i -> System.out.print(i + " "));
    
    // 输出:4 4 4 4 6 6 6 6 6 6 10 10 10 10 10 10 10 10 10 10 2 2 4 4 4 4
    

与flatMap的比较

这种机制的核心思想是可以多次(包括零次)调用它,并且它内部使用SpinedBuffer允许将元素推送到单个扁平化的流实例中,而不是为每组输出元素创建新的流,这与flatMap所要求的不同。Java文档指出了两种情况下使用此方法优于flatMap的情况:

  • 当将每个流元素替换为少量(甚至可能是零个)元素时。使用此方法避免了为每组结果元素创建新的流实例的开销,这是flatMap所要求的。
  • 当使用命令式方法生成结果元素比以Stream的形式返回它们更容易时。

就性能而言,在这些情况下,新的mapMulti方法是更优的。请查看本答案底部的基准测试。

过滤映射场景

使用此方法代替单独的filtermap并不合理,因为它会变得冗长,而且无论如何都会创建一个中间流。一个例外是替换连续调用.filter(..).map(..)链,这在检查元素类型及其转换的情况下非常方便。

int sum = Stream.of(1, 2.0, 3.0, 4F, 5, 6L)
                .mapMultiToInt((number, consumer) -> {
                    if (number instanceof Integer) {
                        consumer.accept((Integer) number);
                    }
			    })
			    .sum();
// 输出:6

如上所示,它的变种,如mapMultiToDoublemapMultiToIntmapMultiToLong已被引入。这些是原始流(primitive Streams)中的mapMulti方法的变种,例如IntStream mapMulti​(IntStream.IntMapMultiConsumer mapper)。另外,还引入了三个新的函数接口。基本上,它们是BiConsumer<T, Consumer<R>>的原始变种,例如:

@FunctionalInterface
interface IntMapMultiConsumer {
    void accept(int value, IntConsumer ic);
}

综合的真实用例场景

这种方法的真正优势在于其使用的灵活性和一次只创建一个流,这是相对于flatMap的主要优势。下面的两个片段表示了Product及其List<Variation>的扁平映射成基于某些条件的0..nOffer类的情况,该情况基于产品类别和变体可用性。

  • Product
英文:

Stream::mapMulti is a new method that is classified as an intermediate operation.

It requires a BiConsumer&lt;T, Consumer&lt;R&gt;&gt; mapper of the element about to be processed a Consumer. The latter makes the method look strange at the first glance because it is different from what we are used to at the other intermediate methods such as map, filter, or peek where none of them use any variation of *Consumer.

The purpose of the Consumer provided right within the lambda expression by the API itself is to accept any number elements to be available in the subsequent pipeline. Therefore, all the elements, regardless of how many, will be propagated.

Explanation using simple snippets

  • One to some (0..1) mapping (similar to filter)

    Using the consumer.accept(R r) for only a few selected items achieves filter-alike pipeline. This might get useful in case of checking the element against a predicate and it's mapping to a different value, which would be otherwise done using a combination of filter and map instead. The following

    Stream.of(&quot;Java&quot;, &quot;Python&quot;, &quot;JavaScript&quot;, &quot;C#&quot;, &quot;Ruby&quot;)
          .mapMulti((str, consumer) -&gt; {
              if (str.length() &gt; 4) {
                  consumer.accept(str.length());  // lengths larger than 4
              }
          })
          .forEach(i -&gt; System.out.print(i + &quot; &quot;));
    
    // 6 10
    
  • One to one mapping (similar to map)

    Working with the previous example, when the condition is omitted and every element is mapped into a new one and accepted using the consumer, the method effectively behaves like map:

    Stream.of(&quot;Java&quot;, &quot;Python&quot;, &quot;JavaScript&quot;, &quot;C#&quot;, &quot;Ruby&quot;)
          .mapMulti((str, consumer) -&gt; consumer.accept(str.length()))
          .forEach(i -&gt; System.out.print(i + &quot; &quot;));
    
    // 4 6 10 2 4
    
  • One to many mapping (similar to flatMap)

    Here things get interesting because one can call consumer.accept(R r) any number of times. Let's say we want to replicate the number representing the String length by itself, i.e. 2 becomes 2, 2. 4 becomes 4, 4, 4, 4. and 0 becomes nothing.

    Stream.of(&quot;Java&quot;, &quot;Python&quot;, &quot;JavaScript&quot;, &quot;C#&quot;, &quot;Ruby&quot;, &quot;&quot;)
          .mapMulti((str, consumer) -&gt; {
              for (int i = 0; i &lt; str.length(); i++) {
                  consumer.accept(str.length());
              }
          })
          .forEach(i -&gt; System.out.print(i + &quot; &quot;));
    
    // 4 4 4 4 6 6 6 6 6 6 10 10 10 10 10 10 10 10 10 10 2 2 4 4 4 4 
    
    

Comparison with flatMap

The very idea of this mechanism is that is can be called multiple times (including zero) and its usage of SpinedBuffer internally allows to push the elements into a single flattened Stream instance without creating a new one for every group of output elements unlike flatMap. The JavaDoc states two use-cases when using this method is preferable over flatMap:

> - When replacing each stream element with a small (possibly zero) number of elements. Using this method avoids the overhead of creating a new Stream instance for every group of result elements, as required by flatMap.
> - When it is easier to use an imperative approach for generating result elements than it is to return them in the form of a Stream.

Performance-wise, the new method mapMulti is a winner in such cases. Check out the benchmark at the bottom of this answer.

Filter-map scenario

Using this method instead of filter or map separately doesn't make sense due to its verbosity and the fact one intermediate stream is created anyway. The exception might be replacing the .filter(..).map(..) chain called together, which comes handy in the case such as checking the element type and its casting.

int sum = Stream.of(1, 2.0, 3.0, 4F, 5, 6L)
                .mapMultiToInt((number, consumer) -&gt; {
                    if (number instanceof Integer) {
                        consumer.accept((Integer) number);
                    }
			    })
			    .sum();
// 6
int sum = Stream.of(1, 2.0, 3.0, 4F, 5, 6L)
                .filter(number -&gt; number instanceof Integer)
                .mapToInt(number -&gt; (Integer) number)
                .sum();

As seen above, its variations like mapMultiToDouble, mapMultiToInt and mapMultiToLong were introduced. This comes along the mapMulti methods within the primitive Streams such as IntStream mapMulti​(IntStream.IntMapMultiConsumer mapper). Also, three new functional interfaces were introduced. Basically, they are the primitive variations of BiConsumer&lt;T, Consumer&lt;R&gt;&gt;, example:

@FunctionalInterface
interface IntMapMultiConsumer {
    void accept(int value, IntConsumer ic);
}

Combined real use-case scenario

The real power of this method is in its flexibility of usage and creating only one Stream at a time, which is the major advantage over flatMap. The two below snippets represent a flatmapping of Product and its List&lt;Variation&gt; into 0..n offers represented by the Offer class and based on certain conditions (product category and the variation availability).

  • Product with String name, int basePrice, String category and List&lt;Variation&gt; variations.
  • Variation with String name, int price and boolean availability.
List&lt;Product&gt; products = ...
List&lt;Offer&gt; offers = products.stream()
		.mapMulti((product, consumer) -&gt; {
			if (&quot;PRODUCT_CATEGORY&quot;.equals(product.getCategory())) {
				for (Variation v : product.getVariations()) {
					if (v.isAvailable()) {
						Offer offer = new Offer(
							product.getName() + &quot;_&quot; + v.getName(),
							product.getBasePrice() + v.getPrice());
						consumer.accept(offer);
					}
				}
			}
		})
		.collect(Collectors.toList());
List&lt;Product&gt; products = ...
List&lt;Offer&gt; offers = products.stream()
		.filter(product -&gt; &quot;PRODUCT_CATEGORY&quot;.equals(product.getCategory()))
		.flatMap(product -&gt; product.getVariations().stream()
		    .filter(Variation::isAvailable)
		    .map(v -&gt; new Offer(
				product.getName() + &quot;_&quot; + v.getName(),
				product.getBasePrice() + v.getPrice()
            ))
        )
		.collect(Collectors.toList());

The use of mapMulti is more imperatively inclined compared to the declarative approach of the previous-versions Stream methods combination seen in the latter snippet using flatMap, map, and filter. From this perspective, it depends on the use-case whether is easier to use an imperative approach. Recursion is a good example described in the JavaDoc.

Benchmark

As promised, I have wrote a bunch of micro-benchmarks from ideas collected from the comments. As long as there is quite a lot of code to publish, I have created a GitHub repository with the implementation details and I am about to share the results only.

Stream::flatMap(Function) vs Stream::mapMulti(BiConsumer) Source

Here we can see the huge difference and a proof the newer method actually works as described and its usage avoid the overhead of creating a new Stream instance with each processed element.

Benchmark                                   Mode  Cnt   Score   Error  Units
MapMulti_FlatMap.flatMap                    avgt   25  73.852 &#177; 3.433  ns/op
MapMulti_FlatMap.mapMulti                   avgt   25  17.495 &#177; 0.476  ns/op

Stream::filter(Predicate).map(Function) vs Stream::mapMulti(BiConsumer) Source

Using chained pipelines (not nested, though) is fine.

Benchmark                                   Mode  Cnt    Score  Error  Units
MapMulti_FilterMap.filterMap                avgt   25   7.973 &#177; 0.378  ns/op
MapMulti_FilterMap.mapMulti                 avgt   25   7.765 &#177; 0.633  ns/op 

Stream::flatMap(Function) with Optional::stream() vs Stream::mapMulti(BiConsumer) Source

This one is very interesting, escpecially in terms of usage (see the source code): we are now able to flatten using mapMulti(Optional::ifPresent) and as expected, the new method is a bit faster in this case.

Benchmark                                   Mode  Cnt   Score   Error  Units
MapMulti_FlatMap_Optional.flatMap           avgt   25  20.186 &#177; 1.305  ns/op
MapMulti_FlatMap_Optional.mapMulti          avgt   25  10.498 &#177; 0.403  ns/op

答案2

得分: 11

为了应对这种情况

> 当使用命令式方法生成结果元素比以Stream的形式返回它们更容易时。

我们可以将其视为现在具有的有限变体在C#中的yield语句。限制在于我们总是需要来自流的初始输入,因为这是一个中间操作,此外,对于我们在一个函数评估中推送的元素,没有短路功能。

不过,它还是开启了一些有趣的机会。

例如,以前实现斐波那契数列的流需要使用能够保存两个值的临时对象的解决方案

现在,我们可以使用类似这样的代码:

IntStream.of(0)
    .mapMulti((a,c) -&gt; {
        for(int b = 1; a &gt;=0; b = a + (a = b))
            c.accept(a);
    })
    /* 这里可以添加其他流操作 */
    .forEach(System.out::println);

它会在int值溢出时停止,正如所说,当我们使用不消耗所有值的终端操作时,它不会进行短路,然而,这个循环生成然后被忽略的值可能仍然比其他方法更快。

另一个例子受到这个回答的启发,从根类到最具体类迭代类层次结构:

Stream.of(LinkedHashMap.class).mapMulti(MapMultiExamples::hierarchy)
    /* 这里可以添加其他流操作 */
    .forEach(System.out::println);
}
static void hierarchy(Class&lt;?&gt; cl, Consumer&lt;? super Class&lt;?&gt;&gt; co) {
    if(cl != null) {
        hierarchy(cl.getSuperclass(), co);
        co.accept(cl);
    }
}

与旧方法不同,这不需要额外的堆存储,并且可能会更快地运行(假设合理的类深度不会使递归逆火)。

还有像[这样的庞然大物]

> &gt; List&lt;A&gt; list = IntStream.range(0, r_i).boxed() &gt; .flatMap(i -&gt; IntStream.range(0, r_j).boxed() &gt; .flatMap(j -&gt; IntStream.range(0, r_k) &gt; .mapToObj(k -&gt; new A(i, j, k)))) &gt; .collect(Collectors.toList()); &gt;

现在可以这样编写

List&lt;A&gt; list = IntStream.range(0, r_i).boxed()
    .&lt;A&gt;mapMulti((i,c) -&gt; {
        for(int j = 0; j &lt; r_j; j++) {
            for(int k = 0; k &lt; r_k; k++) {
                c.accept(new A(i, j, k));
            }
        }
    })
    .collect(Collectors.toList());

与嵌套的flatMap步骤相比,它失去了一些并行性机会,而参考实现无论如何都没有利用这一点。对于像上面那样的非短路操作,新方法可能会从减少装箱和减少捕获lambda表达式的实例化中受益。当然,它应该谨慎使用,不要将每个构造都重写为命令式版本(在许多人尝试将每个命令式代码重写为函数式版本之后)...

英文:

To address the scenario

> When it is easier to use an imperative approach for generating result elements than it is to return them in the form of a Stream.

We can see it as now having a limited variant of the yield statement C#. The limitations are that we always need an initial input from a stream, as this is an intermediate operation, further, there’s no short-circuiting for the elements we’re pushing in one function evaluation.

Still, it opens interesting opportunities.

E.g., implementing a stream of Fibonacci number formerly required a solution using temporary objects capable of holding two values.

Now, we can use something like:

IntStream.of(0)
    .mapMulti((a,c) -&gt; {
        for(int b = 1; a &gt;=0; b = a + (a = b))
            c.accept(a);
    })
    /* additional stream operations here */
    .forEach(System.out::println);

It stops when the int values overflow, as said, it won’t short-circuit when we use a terminal operation that does not consume all values, however, this loop producing then-ignored values might still be faster than the other approaches.

Another example inspired by this answer, to iterate over a class hierarchy from root to most specific:

Stream.of(LinkedHashMap.class).mapMulti(MapMultiExamples::hierarchy)
    /* additional stream operations here */
    .forEach(System.out::println);
}
static void hierarchy(Class&lt;?&gt; cl, Consumer&lt;? super Class&lt;?&gt;&gt; co) {
    if(cl != null) {
        hierarchy(cl.getSuperclass(), co);
        co.accept(cl);
    }
}

which unlike the old approaches does not require additional heap storage and will likely run faster (assuming reasonable class depths that do not make recursion backfire).

Also monsters like this

>
&gt; List&lt;A&gt; list = IntStream.range(0, r_i).boxed()
&gt; .flatMap(i -&gt; IntStream.range(0, r_j).boxed()
&gt; .flatMap(j -&gt; IntStream.range(0, r_k)
&gt; .mapToObj(k -&gt; new A(i, j, k))))
&gt; .collect(Collectors.toList());
&gt;

can now be written like

List&lt;A&gt; list = IntStream.range(0, r_i).boxed()
    .&lt;A&gt;mapMulti((i,c) -&gt; {
        for(int j = 0; j &lt; r_j; j++) {
            for(int k = 0; k &lt; r_k; k++) {
                c.accept(new A(i, j, k));
            }
        }
    })
    .collect(Collectors.toList());

Compared to the nested flatMap steps, it loses some parallelism opportunity, which the reference implementation didn’t exploit anyway. For a non-short-circuiting operation like above, the new method likely will benefit from the reduced boxing and less instantiation of capturing lambda expressions. But of course, it should be used judiciously, not to rewrite every construct to an imperative version (after so many people tried to rewrite every imperative code into a functional version)…

huangapple
  • 本文由 发表于 2020年9月30日 15:27:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/64132803.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定