使用 StringBuilder(…) 作为 reduce 操作中的初始值会产生不可预测的结果。

huangapple go评论90阅读模式
英文:

Using StringBuilder(...) as an identity value in the reduce operations gives unpredictable outcome

问题

问题很直接:为什么在java8流的reduce(...)操作中,我们不能将StringBuilder(...)用作identity function,但可以将string1.concat(string2)用作identity function

string1.concat(string2) 可以视为类似于 builder.append(string)(尽管理解上这些操作之间有一些差异),但我无法理解在 reduce 操作中的区别。考虑以下示例:

List<String> list = Arrays.asList("1", "2", "3"); 

// 使用字符串连接操作的示例
System.out.println(list.stream().parallel()
            .reduce("", (s1, s2) -> s1 + s2, (s1, s2) -> s1 + s2));

// 相同的示例,使用 StringBuilder
System.out.println(list.stream().parallel()
            .reduce(new StringBuilder(""), (builder, s) -> builder
                    .append(s),(builder1, builder2) -> builder1
                    .append(builder2)));

// 使用实际的 concat(...) 方法
System.out.println(list.stream().parallel()
            .reduce("", (s1, s2) -> s1.concat(s2), (s1, s2) -> s1.concat(s2)));

执行上述代码后的输出如下:

123
321321321321   // 在使用 StringBuilder() 作为 Identity 时的输出
123

builder.append(string) 是一个可交换操作,就像 str1.concat(str2) 一样。那么为什么 concat 起作用而 append 则不起作用呢?

英文:

The question is straight forward: Why can't we use StringBuilder(...) as identity function in the reduce(...) operations in the java8 streams, but string1.concat(string2) can be used as the identity function?

string1.concat(string2) can be seen as similar to builder.append(string) (though it is understood that there are few differences in these opeations), but I am not able to understand the difference in the reduce operation. Consider the following example:

  List&lt;String&gt; list = Arrays.asList(&quot;1&quot;, &quot;2&quot;, &quot;3&quot;); 
  
  // Example using the string concatenation operation
  System.out.println(list.stream().parallel()
            .reduce(&quot;&quot;, (s1, s2) -&gt; s1 + s2, (s1, s2)-&gt;s1 + s2));

  // The same example, using the StringBuilder
  System.out.println(list.stream() .parallel()
            .reduce(new StringBuilder(&quot;&quot;), (builder, s) -&gt; builder
                    .append(s),(builder1, builder2) -&gt; builder1
                    .append(builder2)));
 
 // using the actual concat(...) method
 System.out.println(list.stream().parallel()
            .reduce(&quot;&quot;, (s1, s2) -&gt; s1.concat(s2), (s1, s2)-&gt;s1.concat(s2)));

Here is the output after executing above lines:

 123
 321321321321   // output when StringBuilder() is used as Identity
 123

builder.append(string) is an associative operation as str1.concat(str2) is. Then why does concat work and append doesn't?

答案1

得分: 9

是的,append确实是具有关联性的,但这不是作为累加器和组合器传递的函数的唯一要求。根据文档,它们必须满足以下条件:

  • 关联性
  • 非干扰性
  • 无状态性

append不是无状态的。它是有状态的。当你执行sb.append("Hello")时,它不仅会返回一个附加了HelloStringBuilder,而且还会改变sb的内容(即状态)。

同样来自文档

> 如果流操作的行为参数是有状态的,流管道的结果可能是不确定的或不正确的。具有状态的 lambda(或实现适当函数接口的其他对象)是其结果取决于在执行流管道期间可能会更改的任何状态的 lambda。

也因此,new StringBuilder()在累加器或组合器应用后不再是有效的标识。在空标识已经应用了累加器或组合器之后,某些内容将会被添加到空字符串构建器中,而且下面的等式,所有标识都必须满足这个等式,将不再成立:

combiner.apply(u, accumulator.apply(identity, t)) == accumulator.apply(u, t)

并行流有可能在调用累加器和/或组合器后继续使用旧的字符串构建器,并期望它们的内容不会被改变。然而,累加器和组合器会改变字符串构建器的内容,导致流产生不正确的结果。

另一方面,concat满足上述所有三个条件。它是无状态的,因为它不会改变调用它的字符串。它只是返回一个新的连接字符串。(String本身是不可变的,因此不能被改变 :D)

无论如何,这是使用collect进行可变归约的用例:

System.out.println((StringBuilder)list.stream().parallel()
    .collect(
        StringBuilder::new, 
        StringBuilder::append, 
        StringBuilder::append
    )
);
英文:

Yes, append is indeed associative, but that is not the only requirement for the function passed as the accumulator and combiner. According to the docs, they have to be:

  • Associative
  • Non-interfering
  • Stateless

append is not stateless. It is stateful. When you do sb.append(&quot;Hello&quot;), not only does it return a StringBuilder with Hello appended to the end, it also changes the contents (i.e. the state) of sb.

Also from the docs:

> Stream pipeline results may be nondeterministic or incorrect if the behavioral parameters to the stream operations are stateful. A stateful lambda (or other object implementing the appropriate functional interface) is one whose result depends on any state which might change during the execution of the stream pipeline.

Also because of this, new StringBuilder() is not a valid identity, once the accumulator or the combiner has been applied. Something would have been added to the empty string builder, and the following equation, which all identities must satisfy, is no longer satisfied:

combiner.apply(u, accumulator.apply(identity, t)) == accumulator.apply(u, t)

It is possible that the parallel stream makes use of the old string builders after calling the accumulators and/or combiners, and expects their contents to not be changed. However, the accumulators and combiners mutate the string builders, causing the stream to produce incorrect results.

On the other hand, concat satisfies all three of the above. It is stateless because it does not change the string on which it is called on. It just retunes a new, concatenated string. (String is immutable anyway and can't be changed :D)

Anyway, this is a use case of mutable reduction with collect:

System.out.println((StringBuilder)list.stream().parallel()
    .collect(
        StringBuilder::new, 
        StringBuilder::append, 
        StringBuilder::append
    )
);

答案2

得分: 1

经过阅读文档并进行多次测试,我认为 reduce 操作的步骤如下:

  1. 将会有多个线程执行 reduce,每个线程执行部分 reduce;
  2. 对于 identity(初始值),只会有一个实例。每个累加器都会使用这个 identity 实例;
  3. 首先,使用 identity 实例和一个字符串元素进行累加,得到一个 StringBuilder;
  4. 合并所有这些 StringBuilders;

问题在于,每次使用 identity 实例和字符串元素进行累加都会导致 identity 发生变化。第一次之后的累加中的 identity 不再是初始值。

举个例子,我们考虑一个包含 2 个元素 {"1", "2"} 的列表。
将会有 2 个线程,每个线程执行 1 次累加,其中一个会执行最终的合并。
线程 A 使用 identity 实例进行累加,元素是 "1",然后结果是一个内容为 "1" 的 StringBuilder(仍然是 identity,因为 StringBuilder.append 的返回对象是它自身),但是 identity 也变成了 "1" 的内容。然后线程 B 使用 identity 实例进行累加,元素是 "2",结果是 "12",不再是 "2"。然后合并这两个结果,它们都是 identity 实例本身,所以结果是 "1212"。
就像以下代码片段:

StringBuilder identity = new StringBuilder();
StringBuilder accumulate1 = identity.append("1");
StringBuilder accumulate2 = identity.append("2");
StringBuilder combine = accumulate1.append(accumulate2);
// combine、accumulate1 和 accumulate2 都是 identity 实例,结果是 "1212"
return combine;

对于更多的元素,由于线程随机运行,结果每次都会不同。

在我们了解了原因之后,如果我们将累加器修复为以下形式:

new StringBuilder(builder).append(s)

整行代码将会是:

System.out.println(list.stream().parallel().reduce(new StringBuilder(), (builder, s) -> new StringBuilder(builder).append(s),
        (builder1, builder2) -> new StringBuilder(builder1).append(builder2)));

那么就不会再有问题,因为累加器不会改变 identity 实例,每次都会返回新的 StringBuilder。但是与字符串拼接方法相比,这样做并没有多大好处。

编辑:感谢 @Holger 的示例,似乎如果存在 filter 函数,那么某些累加器可能会被跳过。因此,合并函数也需要更改为

new StringBuilder(builder1).append(builder2)
英文:

After read the doc and do many tests, I think reduce is something like following steps:

  1. there will be multi threads to do the reduce, every thread do a
    partial reduce;
  2. for identity, there will be only one instance. Every accumulator will use this identity instance;
  3. first do accumulate with identity instance and a string element to get a
    StringBuilder;
  4. combine all these StringBuilders;

so the problem is every accumulate with identity instance and a string element will cause identity changed. the identity in the accumulates after first time is not identity anymore.

for example, we consider an list with 2 element {"1","2"}.
there will be 2 threads and every thread do 1 accumulate and one of them do last combine.
thread A do accumulate identity with element "1", then result is a StringBuilder which content is "1"(still be the identity, becuase return object of StringBuilder.append is itself), but identity also changed to content "1". then thread B do accumulate identity with element "2", then result is "12", not "2" any more.
then do combine is the result of these two accumulate result, they are all the identity instance itself, so the result will be "1212".
It like following code snippet:

StringBuilder identity = new StringBuilder();
StringBuilder accumulate1 = identity.append(&quot;1&quot;);
StringBuilder accumulate2 = identity.append(&quot;2&quot;);
StringBuilder combine = accumulate1.append(accumulate2);
// combine and accumulate1 and accumulate2 are all identity instance and result is &quot;1212&quot;
return combine; 

for more elements, because of threads running randomly, the result will different every time.

after we know the reason, if we fix the accumulator as following

new StringBuilder(builder).append(s)

and full line code will like:

System.out.println(list.stream().parallel().reduce(new StringBuilder(), (builder, s) -&gt; new StringBuilder(builder).append(s),
        (builder1, builder2) -&gt; new StringBuilder(builder1).append(builder2)));

then there will be no issue any more because accumulator will not change identity instance and return new StringBuilder every time. But it is not worth to do this as no benefit comparing with String concat method.

Edit: Thanks @Holger's example, seems if there is filter function, then some accumulators may be skipped. so the combiner function also need be changed to

new StringBuilder(builder1).append(builder2)

答案3

得分: 0

不要在已经有实现的情况下使用.reduce()(或类似Sweeper答案中的自定义.collect())。

List<String> list = Arrays.asList("1", "2", "3");

// 使用字符串连接操作的示例
System.out.println(list.stream()
   .parallel()
   .collect(Collectors.joining())
);
// 输出 "123"

编辑(这不适用于并行流)

取决于.joining()的实现方式:

final List<String> list = Arrays.asList("1", "2", "3");
System.out.println(list.stream().reduce(new StringBuilder(), 
    StringBuilder::append, 
    StringBuilder::append)
    .toString()
);
// 输出 "123"
英文:

Don't use the .reduce() when there is already an implemantion (or own .collect() like Sweeper's answer).

List&lt;String&gt; list = Arrays.asList(&quot;1&quot;, &quot;2&quot;, &quot;3&quot;); 
  
// Example using the string concatenation operation
System.out.println(list.stream()
   .parallel()
   .collect(Collectors.joining())
);
// prints &quot;123&quot;

Edit (this will not work for parallel streams)

Depends on of the implementation of .joining():

final List&lt;String&gt; list = Arrays.asList(&quot;1&quot;, &quot;2&quot;, &quot;3&quot;);
System.out.println(list.stream().reduce(new StringBuilder(), 
    StringBuilder::append, 
    StringBuilder::append)
    .toString()
);
// prints &quot;123&quot;

huangapple
  • 本文由 发表于 2020年8月24日 15:39:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/63556636.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定