英文:
Scala optimization witth specialized types: sum with longs
问题
以下是翻译好的内容:
我有以下简单的代码
val longs: Vector[Long] = (1L to 1000000L).toVector
以及据称等效的Java代码
def jLongs: java.util.stream.LongStream = java.util.stream.LongStream
.iterate(1L, (i: Long) => i <= 1000000L, (i: Long) => i + 1L)
当我运行以下代码的基准测试
@State(Scope.Benchmark)
@BenchmarkMode(Array(Mode.AverageTime))
@OutputTimeUnit(TimeUnit.MILLISECONDS)
class BoxingScala {
val longs: Vector[Long] = (1L to 1000000L).toVector
def jLongs: java.util.stream.LongStream = java.util.stream.LongStream
.iterate(1L, (i: Long) => i <= 1000000L, (i: Long) => i + 1L)
@Benchmark def a: Long = longs.sum
@Benchmark def b: java.lang.Long = jLongs.sum()
}
我发现Java代码大约快了400%。当我尝试理解其中的原因时,我找到了这段字节码:
// access flags 0x1
public a()J
@Lorg/openjdk/jmh/annotations/Benchmark;()
L0
LINENUMBER <benchmark a> L0
ALOAD 0
INVOKEVIRTUAL gurghet/BoxingScala.longs ()Lscala/collection/immutable/Vector;
GETSTATIC scala/math/Numeric$LongIsIntegral$.MODULE$ : Lscala/math/Numeric$LongIsIntegral$;
INVOKEVIRTUAL scala/collection/immutable/Vector.sum (Lscala/math/Numeric;)Ljava/lang/Object;
INVOKESTATIC scala/runtime/BoxesRunTime.unboxToLong (Ljava/lang/Object;)J
LRETURN
L1
LOCALVARIABLE this Lgurghet/BoxingScala; L0 L1 0
MAXSTACK = 2
MAXLOCALS = 1
// access flags 0x1
public b()Ljava/lang/Long;
@Lorg/openjdk/jmh/annotations/Benchmark;()
L0
LINENUMBER <benchmark b> L0
GETSTATIC scala/Predef$.MODULE$ : Lscala/Predef$;
ALOAD 0
INVOKEVIRTUAL gurghet/BoxingScala.jLongs ()Ljava/util/stream/LongStream;
INVOKEINTERFACE java/util/stream/LongStream.sum ()J (itf)
INVOKEVIRTUAL scala/Predef$.long2Long (J)Ljava/lang/Long;
ARETURN
L1
LOCALVARIABLE this Lgurghet/BoxingScala; L0 L1 0
MAXSTACK = 3
MAXLOCALS = 1
其中longs
是通过以下方式初始化的
L1
LINENUMBER <init> L1
ALOAD 0
NEW scala/runtime/RichLong
DUP
GETSTATIC scala/Predef$.MODULE$ : Lscala/Predef$;
LCONST_1
INVOKEVIRTUAL scala/Predef$.longWrapper (J)J
INVOKESPECIAL scala/runtime/RichLong.<init> (J)V
LDC 1000000
INVOKESTATIC scala/runtime/BoxesRunTime.boxToLong (J)Ljava/lang/Long;
INVOKEVIRTUAL scala/runtime/RichLong.to (Ljava/lang/Object;)Lscala/collection/immutable/NumericRange$Inclusive;
INVOKEVIRTUAL scala/collection/immutable/NumericRange$Inclusive.toVector ()Lscala/collection/immutable/Vector;
PUTFIELD gurghet/BoxingScala.longs : Lscala/collection/immutable/Vector;
因此,我认为Scala版本被强制加载了一百万个对象。
这是否是它如此慢的原因?我如何告诉它专门为long值进行优化?
另外,有趣且违反直觉的是,尽管Java代码返回一个对象,但在Scala中返回一个原始的long值(参见ARETURN
与LRETURN
)。
英文:
I have the following simple code
val longs: Vector[Long] = (1L to 1000000L).toVector
and the supposedly equivalent Java
def jLongs: java.util.stream.LongStream = java.util.stream.LongStream
.iterate(1L, (i: Long) => i <= 1000000L, (i: Long) => i + 1L)
When I run a benchmark with the following code
@State(Scope.Benchmark)
@BenchmarkMode(Array(Mode.AverageTime))
@OutputTimeUnit(TimeUnit.MILLISECONDS)
class BoxingScala {
val longs: Vector[Long] = (1L to 1000000L).toVector
def jLongs: java.util.stream.LongStream = java.util.stream.LongStream
.iterate(1L, (i: Long) => i <= 1000000L, (i: Long) => i + 1L)
@Benchmark def a: Long = longs.sum
@Benchmark def b: java.lang.Long = jLongs.sum()
}
I get that the Java code is roughly 400% faster. When I try to understand why, I find this bytecode:
// access flags 0x1
public a()J
@Lorg/openjdk/jmh/annotations/Benchmark;()
L0
LINENUMBER <benchmark a> L0
ALOAD 0
INVOKEVIRTUAL gurghet/BoxingScala.longs ()Lscala/collection/immutable/Vector;
GETSTATIC scala/math/Numeric$LongIsIntegral$.MODULE$ : Lscala/math/Numeric$LongIsIntegral$;
INVOKEVIRTUAL scala/collection/immutable/Vector.sum (Lscala/math/Numeric;)Ljava/lang/Object;
INVOKESTATIC scala/runtime/BoxesRunTime.unboxToLong (Ljava/lang/Object;)J
LRETURN
L1
LOCALVARIABLE this Lgurghet/BoxingScala; L0 L1 0
MAXSTACK = 2
MAXLOCALS = 1
// access flags 0x1
public b()Ljava/lang/Long;
@Lorg/openjdk/jmh/annotations/Benchmark;()
L0
LINENUMBER <benchmark b> L0
GETSTATIC scala/Predef$.MODULE$ : Lscala/Predef$;
ALOAD 0
INVOKEVIRTUAL gurghet/BoxingScala.jLongs ()Ljava/util/stream/LongStream;
INVOKEINTERFACE java/util/stream/LongStream.sum ()J (itf)
INVOKEVIRTUAL scala/Predef$.long2Long (J)Ljava/lang/Long;
ARETURN
L1
LOCALVARIABLE this Lgurghet/BoxingScala; L0 L1 0
MAXSTACK = 3
MAXLOCALS = 1
where longs
is initialized by
L1
LINENUMBER <init> L1
ALOAD 0
NEW scala/runtime/RichLong
DUP
GETSTATIC scala/Predef$.MODULE$ : Lscala/Predef$;
LCONST_1
INVOKEVIRTUAL scala/Predef$.longWrapper (J)J
INVOKESPECIAL scala/runtime/RichLong.<init> (J)V
LDC 1000000
INVOKESTATIC scala/runtime/BoxesRunTime.boxToLong (J)Ljava/lang/Long;
INVOKEVIRTUAL scala/runtime/RichLong.to (Ljava/lang/Object;)Lscala/collection/immutable/NumericRange$Inclusive;
INVOKEVIRTUAL scala/collection/immutable/NumericRange$Inclusive.toVector ()Lscala/collection/immutable/Vector;
PUTFIELD gurghet/BoxingScala.longs : Lscala/collection/immutable/Vector;
So it seems to me that the scala version is forced to load a million objects.
Is this the reason why it's so slow? How can I tell to specialize for longs?
Also, it is interesting and counter intuitive the fact that, while the java code returns an object, in scala a primitive long is returned (cf. ARETURN
vs. LRETURN
).
答案1
得分: 2
<!-- language-all: scala -->
查看字节码是徒劳的。区别在于`Stream`的本质。`LongStream`根据需要产生元素。它不是数据结构;它是控制结构——潜在的循环,用于某些其他数据源。你的Java代码可以简化为
var sum: Long = 0
for(i <- 1L to 1000000L) sum += i;
`Vector`是一个实际的数据结构,实际上必须存储100000个`long`,使得你的Scala版本本质上是
val oops = new Array[java.lang.Long](1000000) // 装箱!
for(i <- 0 until 1000000) oops(i) = i + 1
var sum: Long = 0
for(i <- 0 until 1000000) sum += oops(i)
这两者之间绝对没有等价关系。还要注意,`1L to 1000000L`是一个`NumericRange[Long]`,它已经是一个Scala集合,`(1 to 1000000L).sum`的速度比这两者都要快,因为它使用简单的算术公式来计算结果。最接近`LongStream`的东西实际上是`SeqView[Long]`,你可以通过`(1L to 1000000).view`获得它。如果在其上调用`sum`,我认为集合库并不足够智能,无法将其简化为对`NumericRange`的`sum`调用,而是会像在Java版本中那样迭代它,从而进行更近似的比较。不过它不会被专门化为`Long`,所以仍然会有装箱的惩罚。
英文:
<!-- language-all: scala -->
Looking at the bytecode is futile. The difference is in what a Stream
is. A LongStream
produces elements on demand. It's not a data structure; it's a control structure—a latent loop over some other data source. Your Java boils down to
var sum: Long = 0
for(i <- 1L to 1000000L) sum += i;
A Vector
is an actual data structure, which actually has to store 100000 long
s, making your Scala version essentially
val oops = new Array[java.lang.Long](1000000) // boxed!
for(i <- 0 until 1000000) oops(i) = i + 1
var sum: Long = 0
for(i <- 0 until 1000000) sum += oops(i)
There is absolutely no equivalence between these. Also note that 1L to 1000000L
is a NumericRange[Long]
, which already is a Scala collection, and (1 to 1000000L).sum
is much faster than either of these, since it uses a simple arithmetic formula to compute the result. The closest thing to a LongStream
is actually a SeqView[Long]
, which you can get as (1L to 1000000).view
. If you call sum
on that, I believe the collections library is not smart enough to simplify that down to a sum
call on the NumericRange
, and will instead iterate it like in the Java version, making a closer comparison. It won't be specialized to Long
, though, so it will still have the boxing penalty.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论