对高负载系统的Java Stream API功能接口进行优化。

huangapple go评论65阅读模式
英文:

Optimization of Java Stream API functional interfaces for highly loaded system

问题

我们有使用Java Stream API的方法,这些方法被非常频繁地调用,例如每秒10,000到20,000次(一个数据流系统)。让我们来审查以下简单的test方法(经过有意简化,不具备真实价值):

public void test() {
    Stream.of(1, 2, 3, 4, 5)
            .map(i -> i * i)
            .filter(new SuperPredicate())
            .sorted(Comparator.comparing(i -> -i + 1, Comparator.nullsFirst(Comparator.naturalOrder())))
            .forEach(System.out::println);
}

class SuperPredicate implements Predicate<Integer> {
    public SuperPredicate() {
        System.out.println("SuperPredicate constructor");
    }
    @Override
    public boolean test(Integer i) {
        return i % 3 != 0;
    }
}

在每次调用test方法时,将会创建函数式接口的新实例(在我们的示例中是SuperPredicateComparator.nullsFirst())。因此,对于频繁的方法调用,将创建数千个多余的对象。我明白在Java中创建对象只需要几纳秒,但是,如果我们谈论的是高负载,它可能会增加GC的负载,从而影响性能。

从我的观察中,我们可以将这些函数式接口的创建移到同一个类内部的private static final变量中,因为它们是无状态的,这会稍微减轻系统的负载。这属于微优化。我们需要这样做吗?Java编译器/ JIT编译器是否会在这种情况下进行优化?或者编译器是否有一些选项/优化标志来改进这种情况?

英文:

We have methods with Java Stream API that are invoked very frequently, e.g. 10'000 - 20'000 times per second (a data streaming system). Let's review the following simple test method (intentionally simplified and doesn't make real value):

public void test() {
        Stream.of(1, 2, 3, 4, 5)
                .map(i -&gt; i * i)
                .filter(new SuperPredicate())
                .sorted(Comparator.comparing(i -&gt; -i + 1,  Comparator.nullsFirst(Comparator.naturalOrder())))
                .forEach(System.out::println);
 }

class SuperPredicate implements Predicate&lt;Integer&gt; {
    public SuperPredicate() {
        System.out.println(&quot;SuperPredicate constructor&quot;);
    }
    @Override
    public boolean test(Integer i) {
        return i % 3 != 0;
    }
}

On each invocation of test method, new instances of functional interfaces will be created (in our example, SuperPredicate and Comparator.nullsFirst()). So for frequent method invocations, thousands of excess objects will be created. I understand that creation of an object takes few nanoseconds in Java, but still, if we are talking about high load, it might also increase load of GC, and, as a result, influence performance.

As I see, we could move creation of such functional interfaces into private static final variables inside the same class, as they are stateless, it slightly decreases load on the system. It's kind of micro-optimization. Do we need to do this? Does Java compiler / JIT compiler somehow optimize such cases? Or maybe the compiler has some options / optimization flags to improve such cases?

答案1

得分: 2

只有在不依赖于周围上下文变量的情况下,才能将对象存储在static final字段中以便重用,更不用说可能会改变状态。

在这种情况下,根本没有理由创建类似于SuperPredicate的类。您可以简单地使用i -> i % 3 != 0,并且免费获得记住第一个创建的实例的行为。正如在https://stackoverflow.com/q/27524445/2711488中所解释的,在引用实现中,为非捕获的lambda表达式创建的实例将被记住并重用。

也无需新的比较器。暂且不论潜在的溢出,使用函数i -> -i + 1只是由于取反而颠倒顺序,而+1对顺序没有影响。由于表达式-i + 1的结果永远不可能为null,因此不需要Comparator.nullsFirst(Comparator.naturalOrder())。因此,您可以用Comparator.reverseOrder()替换整个比较器,得到相同的结果,但不承担任何对象实例化的成本,因为reverseOrder()将返回一个共享的单例。

正如在https://stackoverflow.com/q/28023364/2711488中所解释的,方法引用System.out::println正在捕获当前System.out的值。因此,参考实现不会重用引用PrintStream实例的实例。如果我们将其更改为i -> System.out.println(i),它将是一个非捕获的lambda表达式,每次函数评估时都会重新读取System.out

因此,当我们使用

Stream.of(1, 2, 3, 4, 5)
    .map(i -> i * i)
    .filter(i -> i % 3 != 0)
    .sorted(Comparator.reverseOrder())
    .forEach(i -> System.out.println(i));

而不是您的示例代码时,我们会得到相同的结果,但节省了四个对象实例化,用于谓词、消费者、nullsFirst(…)比较器和comparing(…)比较器。


为了估计这种节省的影响,Stream.of(…)是一个可变参数方法,因此将为参数创建临时数组,然后返回表示流管道的对象。每个中间操作都会创建另一个表示流管道更改状态的临时对象。在内部,将使用Spliterator实现实例。这总共造成了六个临时对象,仅用于描述操作。

当终端操作开始时,将创建表示操作的新对象。每个中间操作都将由对下一个消费者的引用的Consumer实现表示,因此可以将组合的消费者传递给SpliteratorforEachRemaining方法。由于sorted是有状态操作,它将首先将所有元素存储在一个中间的ArrayList(这会产生两个对象)中,然后在将它们传递给下一个消费者之前对其进行排序。

这总共造成了十二个对象,作为流管道的固定开销。操作System.out.println(i)将每个Integer对象转换为一个String对象,它由两个对象组成,因为每个String对象都是围绕数组对象的包装器。对于这个特定的示例,这将为其添加十个额外的对象,但更重要的是,每个元素会有两个对象,因此对于更大的数据集使用相同的流管道将增加操作期间创建的对象数量。

我认为,在幕后和幕前实际创建的临时对象数量,使得节省四个对象变得无关紧要。如果分配和垃圾回收性能对您的操作变得重要,通常需要关注每个元素的成本,而不是流管道的固定成本。

英文:

You can only store objects in static final fields for reuse, when they don’t depend on variables of the surrounding context, not to speak of potentially changing state.

In that case, there is no reason to create a class like SuperPredicate at all. You can simply use    i -&gt; i % 3 != 0 and get the behavior of remembering the first created instance for free. As explained in https://stackoverflow.com/q/27524445/2711488, in the reference implementation, the instances created for non-capturing lambda expressions will be remembered and reused.

There is no need for a new comparator either. Letting potential overflows aside, using the function i -&gt; -i + 1 does just reverse the order due to the negation whereas +1 has no effect on the order. Since the result of the expression -i + 1 can never be null, there is no need for Comparator.nullsFirst(Comparator.naturalOrder()). So you can replace the entire comparator with Comparator.reverseOrder(), to the same result but not bearing any object instantiation, as reverseOrder() will return a shared singleton.

As explained in https://stackoverflow.com/q/28023364/2711488, the method reference System.out::println is capturing the current value of System.out. So the reference implementation does not reuse the instance that is referencing a PrintStream instance. If we change it to i -&gt; System.out.println(i), it will be a non-capturing lambda expression which will re-read System.out on each function evaluation.

So when we use

Stream.of(1, 2, 3, 4, 5)
    .map(i -&gt; i * i)
    .filter(i -&gt; i % 3 != 0)
    .sorted(Comparator.reverseOrder())
    .forEach(i -&gt; System.out.println(i));

instead of your example code, we get the same result, but save four object instantiations, for the predicate, the consumer, the nullsFirst(…) comparator and the comparing(…) comparator.


To estimate the impact of this saving, Stream.of(…) is a varargs method, so a temporary array will be created for the arguments, then, it will return an object representing the stream pipeline. Each intermediate operation creates another temporary object representing the changed state of the stream pipeline. Internally, a Spliterator implementation instance will be used. This make a total of six temporary objects, just for describing the operation.

When the terminal operation starts, a new object representing the operation will be created. Each intermediate operation will be represented by a Consumer implementation having a reference to the next consumer, so the composed consumer can be passed to the Spliterator’s forEachRemaining method. Since sorted is a stateful operation, it will store all elements into an intermediate ArrayList (which makes two objects) first, to sort it before passing them to the next consumer.

This makes a total of twelve objects, as the fixed overhead of the stream pipeline. The operation System.out.println(i) will convert each Integer object to a String object, which consists of two objects, as each String object is a wrapper around an array object. This gives ten additional objects for this specific example, but more important, two objects per element, so using the same stream pipeline for a larger dataset will increase the number of objects created during the operation.

I think, the actual number of temporary objects created before and behind the scenes, renders the saving of four objects irrelevant. If allocation and garbage collection performance ever becomes relevant for your operation, you usually have to focus on the per element costs, rather than the fixed costs of the stream pipeline.

huangapple
  • 本文由 发表于 2020年5月31日 01:59:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/62106608.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定