Scala优化与专用类型:使用长整型进行求和

huangapple go评论64阅读模式
英文:

Scala optimization witth specialized types: sum with longs

问题

以下是翻译好的内容:

我有以下简单的代码

val longs: Vector[Long] = (1L to 1000000L).toVector

以及据称等效的Java代码

def jLongs: java.util.stream.LongStream = java.util.stream.LongStream
    .iterate(1L, (i: Long) => i <= 1000000L, (i: Long) => i + 1L)

当我运行以下代码的基准测试

@State(Scope.Benchmark)
@BenchmarkMode(Array(Mode.AverageTime))
@OutputTimeUnit(TimeUnit.MILLISECONDS)
class BoxingScala {
  val longs: Vector[Long] = (1L to 1000000L).toVector
  def jLongs: java.util.stream.LongStream = java.util.stream.LongStream
    .iterate(1L, (i: Long) => i <= 1000000L, (i: Long) => i + 1L)

  @Benchmark def a: Long = longs.sum

  @Benchmark def b: java.lang.Long = jLongs.sum()
}

我发现Java代码大约快了400%。当我尝试理解其中的原因时,我找到了这段字节码:

// access flags 0x1
public a()J
@Lorg/openjdk/jmh/annotations/Benchmark;()
 L0
  LINENUMBER <benchmark a> L0
  ALOAD 0
  INVOKEVIRTUAL gurghet/BoxingScala.longs ()Lscala/collection/immutable/Vector;
  GETSTATIC scala/math/Numeric$LongIsIntegral$.MODULE$ : Lscala/math/Numeric$LongIsIntegral$;
  INVOKEVIRTUAL scala/collection/immutable/Vector.sum (Lscala/math/Numeric;)Ljava/lang/Object;
  INVOKESTATIC scala/runtime/BoxesRunTime.unboxToLong (Ljava/lang/Object;)J
  LRETURN
 L1
  LOCALVARIABLE this Lgurghet/BoxingScala; L0 L1 0
  MAXSTACK = 2
  MAXLOCALS = 1

// access flags 0x1
public b()Ljava/lang/Long;
@Lorg/openjdk/jmh/annotations/Benchmark;()
 L0
  LINENUMBER <benchmark b> L0
  GETSTATIC scala/Predef$.MODULE$ : Lscala/Predef$;
  ALOAD 0
  INVOKEVIRTUAL gurghet/BoxingScala.jLongs ()Ljava/util/stream/LongStream;
  INVOKEINTERFACE java/util/stream/LongStream.sum ()J (itf)
  INVOKEVIRTUAL scala/Predef$.long2Long (J)Ljava/lang/Long;
  ARETURN
 L1
  LOCALVARIABLE this Lgurghet/BoxingScala; L0 L1 0
  MAXSTACK = 3
  MAXLOCALS = 1

其中longs是通过以下方式初始化的

L1
  LINENUMBER <init> L1
  ALOAD 0
  NEW scala/runtime/RichLong
  DUP
  GETSTATIC scala/Predef$.MODULE$ : Lscala/Predef$;
  LCONST_1
  INVOKEVIRTUAL scala/Predef$.longWrapper (J)J
  INVOKESPECIAL scala/runtime/RichLong.<init> (J)V
  LDC 1000000
  INVOKESTATIC scala/runtime/BoxesRunTime.boxToLong (J)Ljava/lang/Long;
  INVOKEVIRTUAL scala/runtime/RichLong.to (Ljava/lang/Object;)Lscala/collection/immutable/NumericRange$Inclusive;
  INVOKEVIRTUAL scala/collection/immutable/NumericRange$Inclusive.toVector ()Lscala/collection/immutable/Vector;
  PUTFIELD gurghet/BoxingScala.longs : Lscala/collection/immutable/Vector;

因此,我认为Scala版本被强制加载了一百万个对象。

这是否是它如此慢的原因?我如何告诉它专门为long值进行优化?

另外,有趣且违反直觉的是,尽管Java代码返回一个对象,但在Scala中返回一个原始的long值(参见ARETURNLRETURN)。

英文:

I have the following simple code

val longs: Vector[Long] = (1L to 1000000L).toVector

and the supposedly equivalent Java

def jLongs: java.util.stream.LongStream = java.util.stream.LongStream
    .iterate(1L, (i: Long) =&gt; i &lt;= 1000000L, (i: Long) =&gt; i + 1L)

When I run a benchmark with the following code

@State(Scope.Benchmark)
@BenchmarkMode(Array(Mode.AverageTime))
@OutputTimeUnit(TimeUnit.MILLISECONDS)
class BoxingScala {
  val longs: Vector[Long] = (1L to 1000000L).toVector
  def jLongs: java.util.stream.LongStream = java.util.stream.LongStream
    .iterate(1L, (i: Long) =&gt; i &lt;= 1000000L, (i: Long) =&gt; i + 1L)

  @Benchmark def a: Long = longs.sum

  @Benchmark def b: java.lang.Long = jLongs.sum()
}

I get that the Java code is roughly 400% faster. When I try to understand why, I find this bytecode:

  // access flags 0x1
  public a()J
  @Lorg/openjdk/jmh/annotations/Benchmark;()
   L0
    LINENUMBER &lt;benchmark a&gt; L0
    ALOAD 0
    INVOKEVIRTUAL gurghet/BoxingScala.longs ()Lscala/collection/immutable/Vector;
    GETSTATIC scala/math/Numeric$LongIsIntegral$.MODULE$ : Lscala/math/Numeric$LongIsIntegral$;
    INVOKEVIRTUAL scala/collection/immutable/Vector.sum (Lscala/math/Numeric;)Ljava/lang/Object;
    INVOKESTATIC scala/runtime/BoxesRunTime.unboxToLong (Ljava/lang/Object;)J
    LRETURN
   L1
    LOCALVARIABLE this Lgurghet/BoxingScala; L0 L1 0
    MAXSTACK = 2
    MAXLOCALS = 1

  // access flags 0x1
  public b()Ljava/lang/Long;
  @Lorg/openjdk/jmh/annotations/Benchmark;()
   L0
    LINENUMBER &lt;benchmark b&gt; L0
    GETSTATIC scala/Predef$.MODULE$ : Lscala/Predef$;
    ALOAD 0
    INVOKEVIRTUAL gurghet/BoxingScala.jLongs ()Ljava/util/stream/LongStream;
    INVOKEINTERFACE java/util/stream/LongStream.sum ()J (itf)
    INVOKEVIRTUAL scala/Predef$.long2Long (J)Ljava/lang/Long;
    ARETURN
   L1
    LOCALVARIABLE this Lgurghet/BoxingScala; L0 L1 0
    MAXSTACK = 3
    MAXLOCALS = 1

where longs is initialized by

 L1
    LINENUMBER &lt;init&gt; L1
    ALOAD 0
    NEW scala/runtime/RichLong
    DUP
    GETSTATIC scala/Predef$.MODULE$ : Lscala/Predef$;
    LCONST_1
    INVOKEVIRTUAL scala/Predef$.longWrapper (J)J
    INVOKESPECIAL scala/runtime/RichLong.&lt;init&gt; (J)V
    LDC 1000000
    INVOKESTATIC scala/runtime/BoxesRunTime.boxToLong (J)Ljava/lang/Long;
    INVOKEVIRTUAL scala/runtime/RichLong.to (Ljava/lang/Object;)Lscala/collection/immutable/NumericRange$Inclusive;
    INVOKEVIRTUAL scala/collection/immutable/NumericRange$Inclusive.toVector ()Lscala/collection/immutable/Vector;
    PUTFIELD gurghet/BoxingScala.longs : Lscala/collection/immutable/Vector;

So it seems to me that the scala version is forced to load a million objects.

Is this the reason why it's so slow? How can I tell to specialize for longs?

Also, it is interesting and counter intuitive the fact that, while the java code returns an object, in scala a primitive long is returned (cf. ARETURN vs. LRETURN).

答案1

得分: 2

&lt;!-- language-all: scala --&gt;

查看字节码是徒劳的区别在于`Stream`的本质`LongStream`根据需要产生元素它不是数据结构它是控制结构——潜在的循环用于某些其他数据源你的Java代码可以简化为

    var sum: Long = 0
    for(i &lt;- 1L to 1000000L) sum += i;

`Vector`是一个实际的数据结构实际上必须存储100000个`long`使得你的Scala版本本质上是

    val oops = new Array[java.lang.Long](1000000) // 装箱!
    for(i &lt;- 0 until 1000000) oops(i) = i + 1
    var sum: Long = 0
    for(i &lt;- 0 until 1000000) sum += oops(i)

这两者之间绝对没有等价关系还要注意`1L to 1000000L`是一个`NumericRange[Long]`它已经是一个Scala集合`(1 to 1000000L).sum`的速度比这两者都要快因为它使用简单的算术公式来计算结果最接近`LongStream`的东西实际上是`SeqView[Long]`你可以通过`(1L to 1000000).view`获得它如果在其上调用`sum`我认为集合库并不足够智能无法将其简化为对`NumericRange`的`sum`调用而是会像在Java版本中那样迭代它从而进行更近似的比较不过它不会被专门化为`Long`所以仍然会有装箱的惩罚
英文:

<!-- language-all: scala -->

Looking at the bytecode is futile. The difference is in what a Stream is. A LongStream produces elements on demand. It's not a data structure; it's a control structure—a latent loop over some other data source. Your Java boils down to

var sum: Long = 0
for(i &lt;- 1L to 1000000L) sum += i;

A Vector is an actual data structure, which actually has to store 100000 longs, making your Scala version essentially

val oops = new Array[java.lang.Long](1000000) // boxed!
for(i &lt;- 0 until 1000000) oops(i) = i + 1
var sum: Long = 0
for(i &lt;- 0 until 1000000) sum += oops(i)

There is absolutely no equivalence between these. Also note that 1L to 1000000L is a NumericRange[Long], which already is a Scala collection, and (1 to 1000000L).sum is much faster than either of these, since it uses a simple arithmetic formula to compute the result. The closest thing to a LongStream is actually a SeqView[Long], which you can get as (1L to 1000000).view. If you call sum on that, I believe the collections library is not smart enough to simplify that down to a sum call on the NumericRange, and will instead iterate it like in the Java version, making a closer comparison. It won't be specialized to Long, though, so it will still have the boxing penalty.

huangapple
  • 本文由 发表于 2020年4月6日 05:58:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/61050081.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定