2020年4月6日 05:58:25go评论114阅读模式

英文:

Scala optimization witth specialized types: sum with longs

问题

以下是翻译好的内容：

我有以下简单的代码

val longs: Vector[Long] = (1L to 1000000L).toVector

以及据称等效的Java代码

def jLongs: java.util.stream.LongStream = java.util.stream.LongStream
    .iterate(1L, (i: Long) => i <= 1000000L, (i: Long) => i + 1L)

当我运行以下代码的基准测试

@State(Scope.Benchmark)
@BenchmarkMode(Array(Mode.AverageTime))
@OutputTimeUnit(TimeUnit.MILLISECONDS)
class BoxingScala {
  val longs: Vector[Long] = (1L to 1000000L).toVector
  def jLongs: java.util.stream.LongStream = java.util.stream.LongStream
    .iterate(1L, (i: Long) => i <= 1000000L, (i: Long) => i + 1L)
  @Benchmark def a: Long = longs.sum
  @Benchmark def b: java.lang.Long = jLongs.sum()
}

我发现Java代码大约快了400％。当我尝试理解其中的原因时，我找到了这段字节码：

// access flags 0x1
public a()J
@Lorg/openjdk/jmh/annotations/Benchmark;()
 L0
  LINENUMBER <benchmark a> L0
  ALOAD 0
  INVOKEVIRTUAL gurghet/BoxingScala.longs ()Lscala/collection/immutable/Vector;
  GETSTATIC scala/math/Numeric$LongIsIntegral$.MODULE$ : Lscala/math/Numeric$LongIsIntegral$;
  INVOKEVIRTUAL scala/collection/immutable/Vector.sum (Lscala/math/Numeric;)Ljava/lang/Object;
  INVOKESTATIC scala/runtime/BoxesRunTime.unboxToLong (Ljava/lang/Object;)J
  LRETURN
 L1
  LOCALVARIABLE this Lgurghet/BoxingScala; L0 L1 0
  MAXSTACK = 2
  MAXLOCALS = 1
// access flags 0x1
public b()Ljava/lang/Long;
@Lorg/openjdk/jmh/annotations/Benchmark;()
 L0
  LINENUMBER <benchmark b> L0
  GETSTATIC scala/Predef$.MODULE$ : Lscala/Predef$;
  ALOAD 0
  INVOKEVIRTUAL gurghet/BoxingScala.jLongs ()Ljava/util/stream/LongStream;
  INVOKEINTERFACE java/util/stream/LongStream.sum ()J (itf)
  INVOKEVIRTUAL scala/Predef$.long2Long (J)Ljava/lang/Long;
  ARETURN
 L1
  LOCALVARIABLE this Lgurghet/BoxingScala; L0 L1 0
  MAXSTACK = 3
  MAXLOCALS = 1

其中longs是通过以下方式初始化的

L1
  LINENUMBER <init> L1
  ALOAD 0
  NEW scala/runtime/RichLong
  DUP
  GETSTATIC scala/Predef$.MODULE$ : Lscala/Predef$;
  LCONST_1
  INVOKEVIRTUAL scala/Predef$.longWrapper (J)J
  INVOKESPECIAL scala/runtime/RichLong.<init> (J)V
  LDC 1000000
  INVOKESTATIC scala/runtime/BoxesRunTime.boxToLong (J)Ljava/lang/Long;
  INVOKEVIRTUAL scala/runtime/RichLong.to (Ljava/lang/Object;)Lscala/collection/immutable/NumericRange$Inclusive;
  INVOKEVIRTUAL scala/collection/immutable/NumericRange$Inclusive.toVector ()Lscala/collection/immutable/Vector;
  PUTFIELD gurghet/BoxingScala.longs : Lscala/collection/immutable/Vector;

因此，我认为Scala版本被强制加载了一百万个对象。

这是否是它如此慢的原因？我如何告诉它专门为long值进行优化？

另外，有趣且违反直觉的是，尽管Java代码返回一个对象，但在Scala中返回一个原始的long值（参见ARETURN与LRETURN）。

英文:

I have the following simple code

val longs: Vector[Long] = (1L to 1000000L).toVector

and the supposedly equivalent Java

def jLongs: java.util.stream.LongStream = java.util.stream.LongStream
    .iterate(1L, (i: Long) =&gt; i &lt;= 1000000L, (i: Long) =&gt; i + 1L)

When I run a benchmark with the following code

@State(Scope.Benchmark)
@BenchmarkMode(Array(Mode.AverageTime))
@OutputTimeUnit(TimeUnit.MILLISECONDS)
class BoxingScala {
  val longs: Vector[Long] = (1L to 1000000L).toVector
  def jLongs: java.util.stream.LongStream = java.util.stream.LongStream
    .iterate(1L, (i: Long) =&gt; i &lt;= 1000000L, (i: Long) =&gt; i + 1L)
  @Benchmark def a: Long = longs.sum
  @Benchmark def b: java.lang.Long = jLongs.sum()
}

I get that the Java code is roughly 400% faster. When I try to understand why, I find this bytecode:

  // access flags 0x1
  public a()J
  @Lorg/openjdk/jmh/annotations/Benchmark;()
   L0
    LINENUMBER &lt;benchmark a&gt; L0
    ALOAD 0
    INVOKEVIRTUAL gurghet/BoxingScala.longs ()Lscala/collection/immutable/Vector;
    GETSTATIC scala/math/Numeric$LongIsIntegral$.MODULE$ : Lscala/math/Numeric$LongIsIntegral$;
    INVOKEVIRTUAL scala/collection/immutable/Vector.sum (Lscala/math/Numeric;)Ljava/lang/Object;
    INVOKESTATIC scala/runtime/BoxesRunTime.unboxToLong (Ljava/lang/Object;)J
    LRETURN
   L1
    LOCALVARIABLE this Lgurghet/BoxingScala; L0 L1 0
    MAXSTACK = 2
    MAXLOCALS = 1
  // access flags 0x1
  public b()Ljava/lang/Long;
  @Lorg/openjdk/jmh/annotations/Benchmark;()
   L0
    LINENUMBER &lt;benchmark b&gt; L0
    GETSTATIC scala/Predef$.MODULE$ : Lscala/Predef$;
    ALOAD 0
    INVOKEVIRTUAL gurghet/BoxingScala.jLongs ()Ljava/util/stream/LongStream;
    INVOKEINTERFACE java/util/stream/LongStream.sum ()J (itf)
    INVOKEVIRTUAL scala/Predef$.long2Long (J)Ljava/lang/Long;
    ARETURN
   L1
    LOCALVARIABLE this Lgurghet/BoxingScala; L0 L1 0
    MAXSTACK = 3
    MAXLOCALS = 1

where longs is initialized by

 L1
    LINENUMBER &lt;init&gt; L1
    ALOAD 0
    NEW scala/runtime/RichLong
    DUP
    GETSTATIC scala/Predef$.MODULE$ : Lscala/Predef$;
    LCONST_1
    INVOKEVIRTUAL scala/Predef$.longWrapper (J)J
    INVOKESPECIAL scala/runtime/RichLong.&lt;init&gt; (J)V
    LDC 1000000
    INVOKESTATIC scala/runtime/BoxesRunTime.boxToLong (J)Ljava/lang/Long;
    INVOKEVIRTUAL scala/runtime/RichLong.to (Ljava/lang/Object;)Lscala/collection/immutable/NumericRange$Inclusive;
    INVOKEVIRTUAL scala/collection/immutable/NumericRange$Inclusive.toVector ()Lscala/collection/immutable/Vector;
    PUTFIELD gurghet/BoxingScala.longs : Lscala/collection/immutable/Vector;

So it seems to me that the scala version is forced to load a million objects.

Is this the reason why it's so slow? How can I tell to specialize for longs?

Also, it is interesting and counter intuitive the fact that, while the java code returns an object, in scala a primitive long is returned (cf. ARETURN vs. LRETURN).

答案1

得分: 2

&lt;!-- language-all: scala --&gt;
查看字节码是徒劳的。区别在于`Stream`的本质。`LongStream`根据需要产生元素。它不是数据结构；它是控制结构——潜在的循环，用于某些其他数据源。你的Java代码可以简化为
    var sum: Long = 0
    for(i &lt;- 1L to 1000000L) sum += i;
`Vector`是一个实际的数据结构，实际上必须存储100000个`long`，使得你的Scala版本本质上是
    val oops = new Array[java.lang.Long](1000000) // 装箱！
    for(i &lt;- 0 until 1000000) oops(i) = i + 1
    var sum: Long = 0
    for(i &lt;- 0 until 1000000) sum += oops(i)
这两者之间绝对没有等价关系。还要注意，`1L to 1000000L`是一个`NumericRange[Long]`，它已经是一个Scala集合，`(1 to 1000000L).sum`的速度比这两者都要快，因为它使用简单的算术公式来计算结果。最接近`LongStream`的东西实际上是`SeqView[Long]`，你可以通过`(1L to 1000000).view`获得它。如果在其上调用`sum`，我认为集合库并不足够智能，无法将其简化为对`NumericRange`的`sum`调用，而是会像在Java版本中那样迭代它，从而进行更近似的比较。不过它不会被专门化为`Long`，所以仍然会有装箱的惩罚。

英文:

Looking at the bytecode is futile. The difference is in what a Stream is. A LongStream produces elements on demand. It's not a data structure; it's a control structure—a latent loop over some other data source. Your Java boils down to

var sum: Long = 0
for(i &lt;- 1L to 1000000L) sum += i;

A Vector is an actual data structure, which actually has to store 100000 longs, making your Scala version essentially

val oops = new Array[java.lang.Long](1000000) // boxed!
for(i &lt;- 0 until 1000000) oops(i) = i + 1
var sum: Long = 0
for(i &lt;- 0 until 1000000) sum += oops(i)

There is absolutely no equivalence between these. Also note that 1L to 1000000L is a NumericRange[Long], which already is a Scala collection, and (1 to 1000000L).sum is much faster than either of these, since it uses a simple arithmetic formula to compute the result. The closest thing to a LongStream is actually a SeqView[Long], which you can get as (1L to 1000000).view. If you call sum on that, I believe the collections library is not smart enough to simplify that down to a sum call on the NumericRange, and will instead iterate it like in the Java version, making a closer comparison. It won't be specialized to Long, though, so it will still have the boxing penalty.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Scala优化与专用类型：使用长整型进行求和

问题

答案1

PowerMockito模拟在调用真实方法后被清除

Java Postgresql 如何自动生成主键

如何在这个递归方法中只使用一个返回值。

多态性与泛型？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。