英文:
For loop performance: counters with same value vs. different values
问题
我有一个循环,其中有两个计数器:i 和 j。如果它们具有相同的值 - 迭代速度比它们的值不同的情况下要快得多:
基准测试 模式 计数 分数 误差 单位
FloatsArrayBenchmark.times thrpt 20 341805.800 ± 1623.320 ops/s
FloatsArrayBenchmark.times2 thrpt 20 198764.909 ± 1608.387 ops/s
Java 字节码是相同的,这意味着它与一些较低级别的优化有关。有人可以解释为什么会发生这种情况吗?以下是基准测试的代码:
import org.openjdk.jmh.annotations.*;
public class FloatsArrayBenchmark {
public static void main(String[] args) throws Exception {
org.openjdk.jmh.Main.main(new String[]{FloatsArrayBenchmark.class.getSimpleName()});
}
@Benchmark @Fork(value = 1, warmups = 0)
public void times(Data data) {
float[] result = new float[10000];
for (int i = 0, j = 0; i < 9_999; i++, j++)
result[j] = data.floats[i] * 10;
}
@Benchmark @Fork(value = 1, warmups = 0)
public void times2(Data data) {
float[] result = new float[10000];
for (int i = 0, j = 1; i < 9_999; i++, j++)
result[j] = data.floats[i] * 10;
}
@State(Scope.Benchmark)
public static class Data {
private final float[] floats = new float[10000];
}
}
环境:
- MacOS,尝试过 Java8、Java11、Java14
- 2.4 GHz 四核 Intel Core i5
英文:
I have a loop with 2 counters: i and j. If they have the same value - iteration works much faster than if their values differ:
Benchmark Mode Cnt Score Error Units
FloatsArrayBenchmark.times thrpt 20 341805.800 ± 1623.320 ops/s
FloatsArrayBenchmark.times2 thrpt 20 198764.909 ± 1608.387 ops/s
Java bytecode is identical, which means it's related to some lower level optimizations. Can someone explain why this is happening? Here's the benchmark:
import org.openjdk.jmh.annotations.*;
public class FloatsArrayBenchmark {
public static void main(String[] args) throws Exception {
org.openjdk.jmh.Main.main(new String[]{FloatsArrayBenchmark.class.getSimpleName()});
}
@Benchmark @Fork(value = 1, warmups = 0)
public void times(Data data) {
float[] result = new float[10000];;
for (int i = 0, j=0; i < 9_999; i++,j++)
result[j] = data.floats[i] * 10;
}
@Benchmark @Fork(value = 1, warmups = 0)
public void times2(Data data) {
float[] result = new float[10000];
for (int i = 0,j=1; i < 9_999; i++,j++)
result[j] = data.floats[i] * 10;
}
@State(Scope.Benchmark)
public static class Data {
private final float[] floats = new float[10000];
}
}
Environment:
- MacOS, tried Java8, Java11, Java14
- 2,4 GHz Quad-Core Intel Core i5
答案1
得分: 3
在第一个(更快)版本中,i
总是(实际上)与 j
具有相同的值,因此它:
public void times(Data data) {
float[] result = new float[10000];;
for (int i=0, j=0; i < 9_999; i++,j++)
result[j] = data.floats[i] * 10;
}
可以重新编写而不使用 j
,效果相同:
public void times(Data data) {
float[] result = new float[10000];;
for (int i = 0; i < 9_999; i++)
result[i] = data.floats[i] * 10;
}
很可能编译器认识到 j
是多余的并将其消除,从而减少了执行的 ++
操作数量,这占了所有算术操作的 1/3。这与计时一致:第二个版本每次迭代花费的时间更长约 70%。70% 大约是 50%(3:2 操作比率)的结果。
英文:
In the first (faster) version, i
always (effectively) has the same value as j
, so it:
public void times(Data data) {
float[] result = new float[10000];;
for (int i=0, j=0; i < 9_999; i++,j++)
result[j] = data.floats[i] * 10;
}
can be re-written without j
with identical effect:
public void times(Data data) {
float[] result = new float[10000];;
for (int i = 0; i < 9_999; i++)
result[i] = data.floats[i] * 10;
}
It is likely that the compiler recognised thatj
is redundant and eliminated it, resulting in half the number of ++
operations performed, which accounts for 1/3 of all aritmetic operations. This is consistent with the timings: the second version takes 70% longer per iteration. 70% is approxiately 50%, the result expected for a ratio of 3:2 operations.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论