在Java中,“if”和“if else”的不同表现

huangapple go评论82阅读模式
英文:

Different performance of "if" and "if else" in Java

问题

我注意到 `if else` / 三元运算符 (`condition ? a : b`) 赋值比在 `if` 语句中的条件赋值要更快我对不同的 JDK 进行了 JMH 基准测试但我将重点放在 JDK 12 上

**(操作数 /越高越好)**
[![JMH 基准测试][1]][1]

**源代码**

```java
@State(Scope.Benchmark)
public class FindMaxBenchmark {
    public static int SIZE = 1_000_000;

    @Benchmark
    @CompilerControl(CompilerControl.Mode.DONT_INLINE)
    public static void findMax_if(Blackhole bh, Mock mock) {
        int result = Integer.MIN_VALUE;
        int[] data = mock.tab;

        for (int i = 0; i < data.length; i++) {
            if (data[i] > result) {
                result = data[i];
            }
        }

        bh.consume(result);
    }

    @Benchmark
    @CompilerControl(CompilerControl.Mode.DONT_INLINE)
    public static void findMax_if_else(Blackhole bh, Mock mock) {
        int result = Integer.MIN_VALUE;
        int[] data = mock.tab;

        for (int i = 0; i < data.length; i++) {
            if (data[i] > result) {
                result = data[i];
            } else {
                result = result;
            }
        }

        bh.consume(result);
    }

    @Benchmark
    @CompilerControl(CompilerControl.Mode.DONT_INLINE)
    public static void findMax_ternary(Blackhole bh, Mock mock) {
        int result = Integer.MIN_VALUE;
        int[] data = mock.tab;

        for (int i = 0; i < data.length; i++) {
            result = data[i] > result ? data[i] : result;
        }

        bh.consume(result);
    }

    @Benchmark
    @CompilerControl(CompilerControl.Mode.DONT_INLINE)
    public static void findMax_intrinsicMax(Blackhole bh, Mock mock) {
        int result = Integer.MIN_VALUE;
        int[] data = mock.tab;

        for (int i = 0; i < data.length; i++) {
            result = Math.max(data[i], result);
        }

        bh.consume(result);
    }

    @State(Scope.Thread)
    public static class Mock {
        private int[] tab = new int[SIZE];

        public int[] getTab() {
            return tab;
        }

        @Setup(Level.Iteration)
        public void setup() {
            Random r = new Random();
            this.tab = r.ints(SIZE).toArray();
        }
    }
}

findMax_if_else perfasm 输出(三元运算符几乎相同):

c2, level 4, codes.dbg.FindMaxBenchmark::findMax_if_else, version 493 (165 bytes)
    
                                 0x00007fc7a8671a6b: cmp    r8d,ebp
                                0x00007fc7a8671a6e: jae    0x00007fc7a8671b3d
                                0x00007fc7a8671a74: mov    edx,DWORD PTR [r9+0x10]        ;*iaload {reexecute=0 rethrow=0 return_oop=0}
                                                                                          ; - codes.dbg.FindMaxBenchmark::findMax_if_else@21 (line 34)
                                0x00007fc7a8671a78: cmp    edx,0x80000000
             │╭                  0x00007fc7a8671a7e: jg     0x00007fc7a8671a85             ;*if_icmple {reexecute=0 rethrow=0 return_oop=0}
             ││                                                                            ; - codes.dbg.FindMaxBenchmark::findMax_if_else@23 (line 34)
             ││                  0x00007fc7a8671a80: mov    edx,0x80000000                 ;*iinc {reexecute=0 rethrow=0 return_oop=0}
             ││                                                                            ; - codes.dbg.FindMaxBenchmark::findMax_if_else@36 (line 33)
             │↘                  0x00007fc7a8671a85: mov    ebx,ebp
      0.02%                     0x00007fc7a8671a87: add    ebx,0xfffffffd
                                0x00007fc7a8671a8a: cmp    r8d,ebx
                                0x00007fc7a8671a8d: cmovl  ebx,r11d
                                0

<details>
<summary>英文:</summary>

I noticed that `if else` / ternary (`condition ? a : b`) assigment  is faster than conditional assigment in `if` only statement. I performed JMH benchmarks on different JDKs but i will focus on JDK 12.

**(ops / sec, higher is better)**
[![JMH benchmark][1]][1]

**Source code:**

    @State(Scope.Benchmark)
    public class FindMaxBenchmark {
        public static int SIZE = 1_000_000;
    
        @Benchmark
        @CompilerControl(CompilerControl.Mode.DONT_INLINE)
        public static void findMax_if(Blackhole bh, Mock mock) {
            int result = Integer.MIN_VALUE;
            int[] data = mock.tab;
    
            for (int i = 0; i &lt; data.length; i++) {
                if (data[i] &gt; result) {
                    result = data[i];
                }
            }
    
            bh.consume(result);
        }
    
        @Benchmark
        @CompilerControl(CompilerControl.Mode.DONT_INLINE)
        public static void findMax_if_else(Blackhole bh, Mock mock) {
            int result = Integer.MIN_VALUE;
            int[] data = mock.tab;
    
            for (int i = 0; i &lt; data.length; i++) {
                if (data[i] &gt; result) {
                    result = data[i];
                } else {
                    result = result;
                }
            }
    
            bh.consume(result);
        }
    
        @Benchmark
        @CompilerControl(CompilerControl.Mode.DONT_INLINE)
        public static void findMax_ternary(Blackhole bh, Mock mock) {
            int result = Integer.MIN_VALUE;
            int[] data = mock.tab;
    
            for (int i = 0; i &lt; data.length; i++) {
                result = data[i] &gt; result ? data[i] : result;
            }
    
            bh.consume(result);
        }
    
        @Benchmark
        @CompilerControl(CompilerControl.Mode.DONT_INLINE)
        public static void findMax_intrinsicMax(Blackhole bh, Mock mock) {
            int result = Integer.MIN_VALUE;
            int[] data = mock.tab;
    
            for (int i = 0; i &lt; data.length; i++) {
                result = Math.max(data[i], result);
            }
    
            bh.consume(result);
        }
    
        @State(Scope.Thread)
        public static class Mock {
            private int[] tab = new int[SIZE];
    
            public int[] getTab() {
                return tab;
            }
    
            @Setup(Level.Iteration)
            public void setup() {
                Random r = new Random();
                this.tab = r.ints(SIZE).toArray();
            }
        }
    }

**`findMax_if_else` perfasm output (ternary is almost the same):**

    c2, level 4, codes.dbg.FindMaxBenchmark::findMax_if_else, version 493 (165 bytes)
    
                                 0x00007fc7a8671a6b: cmp    r8d,ebp
                                0x00007fc7a8671a6e: jae    0x00007fc7a8671b3d
                                0x00007fc7a8671a74: mov    edx,DWORD PTR [r9+0x10]        ;*iaload {reexecute=0 rethrow=0 return_oop=0}
                                                                                          ; - codes.dbg.FindMaxBenchmark::findMax_if_else@21 (line 34)
                                0x00007fc7a8671a78: cmp    edx,0x80000000
             │╭                  0x00007fc7a8671a7e: jg     0x00007fc7a8671a85             ;*if_icmple {reexecute=0 rethrow=0 return_oop=0}
             ││                                                                            ; - codes.dbg.FindMaxBenchmark::findMax_if_else@23 (line 34)
             ││                  0x00007fc7a8671a80: mov    edx,0x80000000                 ;*iinc {reexecute=0 rethrow=0 return_oop=0}
             ││                                                                            ; - codes.dbg.FindMaxBenchmark::findMax_if_else@36 (line 33)
             │↘                  0x00007fc7a8671a85: mov    ebx,ebp
      0.02%                     0x00007fc7a8671a87: add    ebx,0xfffffffd
                                0x00007fc7a8671a8a: cmp    r8d,ebx
                                0x00007fc7a8671a8d: cmovl  ebx,r11d
                                0x00007fc7a8671a91: mov    r8d,0x1
      0.00%                     0x00007fc7a8671a97: cmp    ebx,0x1
                               0x00007fc7a8671a9a: jle    0x00007fc7a8671b00
                               0x00007fc7a8671a9c: mov    rdi,r9                         ;*goto {reexecute=0 rethrow=0 return_oop=0}
                                                                                         ; - codes.dbg.FindMaxBenchmark::findMax_if_else@39 (line 33)
              │╭                0x00007fc7a8671a9f: jmp    0x00007fc7a8671ab9
      0.01%   ││               0x00007fc7a8671aa1: mov    edx,ecx
              ││               0x00007fc7a8671aa3: nop    DWORD PTR [rax+0x0]
              ││               0x00007fc7a8671aaa: nop    WORD PTR [rax+rax*1+0x0]
      8.06%   ││    ↗│          0x00007fc7a8671ab0: add    r8d,0x4                        ;*iinc {reexecute=0 rethrow=0 return_oop=0}
              ││    ││                                                                    ; - codes.dbg.FindMaxBenchmark::findMax_if_else@36 (line 33)
     11.38%   ││    ││          0x00007fc7a8671ab4: cmp    r8d,ebx
     13.63%   ││╭   ││          0x00007fc7a8671ab7: jge    0x00007fc7a8671af1             ;*aload_3 {reexecute=0 rethrow=0 return_oop=0}
              │││   ││                                                                    ; - codes.dbg.FindMaxBenchmark::findMax_if_else@18 (line 34)
      3.02%   │↘│   ││         0x00007fc7a8671ab9: mov    r11d,DWORD PTR [r9+r8*4+0x10]  ;*iaload {reexecute=0 rethrow=0 return_oop=0}
                  ││                                                                   ; - codes.dbg.FindMaxBenchmark::findMax_if_else@21 (line 34)
      8.53%       ││         0x00007fc7a8671abe: cmp    r11d,edx
      4.54%    │╭  ││         0x00007fc7a8671ac1: jg     0x00007fc7a8671ae2             ;*iinc {reexecute=0 rethrow=0 return_oop=0}
               ││  ││                                                                   ; - codes.dbg.FindMaxBenchmark::findMax_if_else@36 (line 33)
      4.96%    ││  ││↗        0x00007fc7a8671ac3: mov    r11d,DWORD PTR [r9+r8*4+0x14]  ;*iaload {reexecute=0 rethrow=0 return_oop=0}
               ││  │││                                                                  ; - codes.dbg.FindMaxBenchmark::findMax_if_else@21 (line 34)
      3.73%    ││  │││        0x00007fc7a8671ac8: cmp    r11d,edx
      9.19%    ││╭ │││        0x00007fc7a8671acb: jg     0x00007fc7a8671ae7             ;*iinc {reexecute=0 rethrow=0 return_oop=0}
               │││ │││                                                                  ; - codes.dbg.FindMaxBenchmark::findMax_if_else@36 (line 33)
      3.70%    │││ │││↗       0x00007fc7a8671acd: mov    r11d,DWORD PTR [r9+r8*4+0x18]  ;*iaload {reexecute=0 rethrow=0 return_oop=0}
               │││ ││││                                                                 ; - codes.dbg.FindMaxBenchmark::findMax_if_else@21 (line 34)
      4.96%    │││ ││││       0x00007fc7a8671ad2: cmp    r11d,edx
      4.45%    │││╭││││       0x00007fc7a8671ad5: jg     0x00007fc7a8671aec             ;*iinc {reexecute=0 rethrow=0 return_oop=0}
               ││││││││                                                                 ; - codes.dbg.FindMaxBenchmark::findMax_if_else@36 (line 33)
      8.55%    ││││││││↗│      0x00007fc7a8671ad7: mov    ecx,DWORD PTR [r9+r8*4+0x1c]   ;*iaload {reexecute=0 rethrow=0 return_oop=0}
               ││││││││││                                                                ; - codes.dbg.FindMaxBenchmark::findMax_if_else@21 (line 34)
      6.11%    ││││││││││      0x00007fc7a8671adc: cmp    ecx,edx
      2.48%    ││││╰│││││      0x00007fc7a8671ade: jle    0x00007fc7a8671ab0             ;*if_icmple {reexecute=0 rethrow=0 return_oop=0}
               ││││ │││││                                                                ; - codes.dbg.FindMaxBenchmark::findMax_if_else@23 (line 34)
               ││││ ╰││││      0x00007fc7a8671ae0: jmp    0x00007fc7a8671aa1
               │↘││  ││││      0x00007fc7a8671ae2: mov    edx,r11d
      0.00%     ││  ╰│││      0x00007fc7a8671ae5: jmp    0x00007fc7a8671ac3
      0.00%     ↘│   │││      0x00007fc7a8671ae7: mov    edx,r11d
      0.00%         ╰││      0x00007fc7a8671aea: jmp    0x00007fc7a8671acd
      0.00%          ││      0x00007fc7a8671aec: mov    edx,r11d
      0.00%           ╰│      0x00007fc7a8671aef: jmp    0x00007fc7a8671ad7
                             0x00007fc7a8671af1: mov    r11,QWORD PTR [r15+0x108]      ; ImmutableOopMap{r10=Oop r9=NarrowOop rdi=Oop }
                                                                                        ;*goto {reexecute=1 rethrow=0 return_oop=0}
                                                                                        ; - codes.dbg.FindMaxBenchmark::findMax_if_else@39 (line 33)
                              0x00007fc7a8671af8: test   DWORD PTR [r11],eax            ;*goto {reexecute=0 rethrow=0 return_oop=0}
                                                                                        ; - codes.dbg.FindMaxBenchmark::findMax_if_else@39 (line 33)
                                                                                        ;   {poll}
                              0x00007fc7a8671afb: cmp    r8d,ebx
      0.00%                   0x00007fc7a8671afe: jl     0x00007fc7a8671ab9
                               0x00007fc7a8671b00: cmp    r8d,ebp
      0.00%                    0x00007fc7a8671b03: jge    0x00007fc7a8671b1a
                               0x00007fc7a8671b05: data16 xchg ax,ax                     ;*aload_3 {reexecute=0 rethrow=0 return_oop=0}
                                                                                         ; - codes.dbg.FindMaxBenchmark::findMax_if_else@18 (line 34)
                              0x00007fc7a8671b08: mov    r11d,DWORD PTR [r9+r8*4+0x10]  ;*iaload {reexecute=0 rethrow=0 return_oop=0}
                                                                                        ; - codes.dbg.FindMaxBenchmark::findMax_if_else@21 (line 34)
      0.01%                   0x00007fc7a8671b0d: cmp    r11d,edx
                          │╭│   0x00007fc7a8671b10: jg     0x00007fc7a8671b38
                          │││↗  0x00007fc7a8671b12: inc    r8d                            ;*iinc {reexecute=0 rethrow=0 return_oop=0}
                          ││││                                                            ; - codes.dbg.FindMaxBenchmark::findMax_if_else@36 (line 33)
                          ││││  0x00007fc7a8671b15: cmp    r8d,ebp
                          ││╰│  0x00007fc7a8671b18: jl     0x00007fc7a8671b08             ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
                          ││                                                             ; - codes.dbg.FindMaxBenchmark::findMax_if_else@15 (line 33)
                          ↘│   0x00007fc7a8671b1a: test   r10,r10
      0.00%                   0x00007fc7a8671b1d: je     0x00007fc7a8671b52
                              0x00007fc7a8671b1f: mov    rsi,r10
                              0x00007fc7a8671b22: nop
                              0x00007fc7a8671b23: call   0x00007fc7a8671ba0             ; ImmutableOopMap{}
                                                                                        ;*invokevirtual consume {reexecute=0 rethrow=0 return_oop=0}
                                                                                        ; - codes.dbg.FindMaxBenchmark::findMax_if_else@44 (line 41)
                                                                                        ;   {optimized virtual_call}
                              0x00007fc7a8671b28: add    rsp,0x20
      0.01%                   0x00007fc7a8671b2c: pop    rbp
                              0x00007fc7a8671b2d: mov    r10,QWORD PTR [r15+0x108]
                              0x00007fc7a8671b34: test   DWORD PTR [r10],eax            ;   {poll_return}
                              0x00007fc7a8671b37: ret
                              0x00007fc7a8671b38: mov    edx,r11d
                               0x00007fc7a8671b3b: jmp    0x00007fc7a8671b12
                                0x00007fc7a8671b3d: mov    esi,0xffffff7e
                                 0x00007fc7a8671b42: mov    QWORD PTR [rsp],r10
                                 0x00007fc7a8671b46: mov    DWORD PTR [rsp+0x8],r9d
                                 0x00007fc7a8671b4b: call   0x00007fc7a0ba3d00             ; ImmutableOopMap{[0]=Oop [8]=NarrowOop }
                                                                                           ;*if_icmpge {reexecute=1 rethrow=0 return_oop=0}

**`findMax_if` perfasm output:**

    c2, level 4, codes.dbg.FindMaxBenchmark::findMax_if, version 480 (165 bytes)
    
                                  0x00007f34cc66e7eb: cmp    r8d,ebp
                                 0x00007f34cc66e7ee: jae    0x00007f34cc66e8c4
                                 0x00007f34cc66e7f4: mov    edx,DWORD PTR [r9+0x10]        ;*iaload {reexecute=0 rethrow=0 return_oop=0}
                                                                                           ; - codes.dbg.FindMaxBenchmark::findMax_if@21 (line 19)
                                 0x00007f34cc66e7f8: cmp    edx,0x80000000
             │╭                   0x00007f34cc66e7fe: jg     0x00007f34cc66e805             ;*if_icmple {reexecute=0 rethrow=0 return_oop=0}
             ││                                                                             ; - codes.dbg.FindMaxBenchmark::findMax_if@23 (line 19)
             ││                   0x00007f34cc66e800: mov    edx,0x80000000                 ;*iinc {reexecute=0 rethrow=0 return_oop=0}
             ││                                                                             ; - codes.dbg.FindMaxBenchmark::findMax_if@31 (line 18)
             │↘                   0x00007f34cc66e805: mov    ebx,ebp
      0.01%                      0x00007f34cc66e807: add    ebx,0xfffffffd
                                 0x00007f34cc66e80a: cmp    r8d,ebx
                                 0x00007f34cc66e80d: cmovl  ebx,r11d
                                 0x00007f34cc66e811: mov    r8d,0x1
                                 0x00007f34cc66e817: cmp    ebx,0x1
                                0x00007f34cc66e81a: jle    0x00007f34cc66e880
                                0x00007f34cc66e81c: mov    rdi,r9                         ;*goto {reexecute=0 rethrow=0 return_oop=0}
                                                                                          ; - codes.dbg.FindMaxBenchmark::findMax_if@34 (line 18)
              │╭                 0x00007f34cc66e81f: jmp    0x00007f34cc66e839
              ││                0x00007f34cc66e821: mov    edx,ecx
      0.00%   ││                0x00007f34cc66e823: nop    DWORD PTR [rax+0x0]
              ││                0x00007f34cc66e82a: nop    WORD PTR [rax+rax*1+0x0]
      0.89%   ││    │↗           0x00007f34cc66e830: add    r8d,0x4                        ;*iinc {reexecute=0 rethrow=0 return_oop=0}
              ││    ││                                                                     ; - codes.dbg.FindMaxBenchmark::findMax_if@31 (line 18)
     12.36%   ││    ││           0x00007f34cc66e834: cmp    r8d,ebx
      0.11%   ││╭   ││           0x00007f34cc66e837: jge    0x00007f34cc66e871             ;*aload_3 {reexecute=0 rethrow=0 return_oop=0}
              │││   ││                                                                     ; - codes.dbg.FindMaxBenchmark::findMax_if@18 (line 19)
      9.94%   │↘│   ││          0x00007f34cc66e839: mov    r11d,DWORD PTR [r9+r8*4+0x10]  ;*iaload {reexecute=0 rethrow=0 return_oop=0}
                  ││                                                                    ; - codes.dbg.FindMaxBenchmark::findMax_if@21 (line 19)
      0.11%       ││          0x00007f34cc66e83e: cmp    r11d,edx
     10.05%    │╭  ││          0x00007f34cc66e841: jg     0x00007f34cc66e862             ;*iinc {reexecute=0 rethrow=0 return_oop=0}
               ││  ││                                                                    ; - codes.dbg.FindMaxBenchmark::findMax_if@31 (line 18)
      0.13%    ││  ││↗         0x00007f34cc66e843: mov    r11d,DWORD PTR [r9+r8*4+0x14]  ;*iaload {reexecute=0 rethrow=0 return_oop=0}
               ││  │││                                                                   ; - codes.dbg.FindMaxBenchmark::findMax_if@21 (line 19)
      9.84%    ││  │││         0x00007f34cc66e848: cmp    r11d,edx
      0.11%    ││╭ │││         0x00007f34cc66e84b: jg     0x00007f34cc66e867             ;*iinc {reexecute=0 rethrow=0 return_oop=0}
               │││ │││                                                                   ; - codes.dbg.FindMaxBenchmark::findMax_if@31 (line 18)
     10.02%    │││ │││↗        0x00007f34cc66e84d: mov    r11d,DWORD PTR [r9+r8*4+0x18]  ;*iaload {reexecute=0 rethrow=0 return_oop=0}
               │││ ││││                                                                  ; - codes.dbg.FindMaxBenchmark::findMax_if@21 (line 19)
      0.33%    │││ ││││        0x00007f34cc66e852: cmp    r11d,edx
     23.63%    │││╭││││        0x00007f34cc66e855: jg     0x00007f34cc66e86c             ;*iinc {reexecute=0 rethrow=0 return_oop=0}
               ││││││││                                                                  ; - codes.dbg.FindMaxBenchmark::findMax_if@31 (line 18)
      0.13%    ││││││││↗│       0x00007f34cc66e857: mov    ecx,DWORD PTR [r9+r8*4+0x1c]   ;*iaload {reexecute=0 rethrow=0 return_oop=0}
               ││││││││││                                                                 ; - codes.dbg.FindMaxBenchmark::findMax_if@21 (line 19)
      9.89%    ││││││││││       0x00007f34cc66e85c: cmp    ecx,edx
      0.11%    ││││╰│││││       0x00007f34cc66e85e: jg     0x00007f34cc66e821             ;*if_icmple {reexecute=0 rethrow=0 return_oop=0}
               ││││ │││││                                                                 ; - codes.dbg.FindMaxBenchmark::findMax_if@23 (line 19)
      9.71%    ││││ ╰││││       0x00007f34cc66e860: jmp    0x00007f34cc66e830
               │↘││  ││││       0x00007f34cc66e862: mov    edx,r11d
      0.00%     ││  ╰│││       0x00007f34cc66e865: jmp    0x00007f34cc66e843
                ↘│   │││       0x00007f34cc66e867: mov    edx,r11d
      0.00%         ╰││       0x00007f34cc66e86a: jmp    0x00007f34cc66e84d
                     ││       0x00007f34cc66e86c: mov    edx,r11d
      0.00%           ╰│       0x00007f34cc66e86f: jmp    0x00007f34cc66e857
                              0x00007f34cc66e871: mov    r11,QWORD PTR [r15+0x108]      ; ImmutableOopMap{r10=Oop r9=NarrowOop rdi=Oop }
                                                                                         ;*goto {reexecute=1 rethrow=0 return_oop=0}
                                                                                         ; - codes.dbg.FindMaxBenchmark::findMax_if@34 (line 18)
      0.00%                    0x00007f34cc66e878: test   DWORD PTR [r11],eax            ;*goto {reexecute=0 rethrow=0 return_oop=0}
                                                                                         ; - codes.dbg.FindMaxBenchmark::findMax_if@34 (line 18)
                                                                                         ;   {poll}
                               0x00007f34cc66e87b: cmp    r8d,ebx
                               0x00007f34cc66e87e: jl     0x00007f34cc66e839
                                0x00007f34cc66e880: cmp    r8d,ebp
      0.00%                     0x00007f34cc66e883: jge    0x00007f34cc66e89a
                                0x00007f34cc66e885: data16 xchg ax,ax                     ;*aload_3 {reexecute=0 rethrow=0 return_oop=0}
                                                                                          ; - codes.dbg.FindMaxBenchmark::findMax_if@18 (line 19)
      0.00%                    0x00007f34cc66e888: mov    r11d,DWORD PTR [r9+r8*4+0x10]  ;*iaload {reexecute=0 rethrow=0 return_oop=0}
                                                                                         ; - codes.dbg.FindMaxBenchmark::findMax_if@21 (line 19)
      0.01%                    0x00007f34cc66e88d: cmp    r11d,edx
                          │╭│    0x00007f34cc66e890: jg     0x00007f34cc66e8b8
                          │││↗   0x00007f34cc66e892: inc    r8d                            ;*iinc {reexecute=0 rethrow=0 return_oop=0}
                          ││││                                                             ; - codes.dbg.FindMaxBenchmark::findMax_if@31 (line 18)
                          ││││   0x00007f34cc66e895: cmp    r8d,ebp
                          ││╰│   0x00007f34cc66e898: jl     0x00007f34cc66e888             ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
                          ││                                                              ; - codes.dbg.FindMaxBenchmark::findMax_if@15 (line 18)
                          ↘│ │↗  0x00007f34cc66e89a: test   r10,r10
      0.00%                 ││  0x00007f34cc66e89d: je     0x00007f34cc66e8da
                            ││  0x00007f34cc66e89f: mov    rsi,r10
                            ││  0x00007f34cc66e8a2: nop
                            ││  0x00007f34cc66e8a3: call   0x00007f34cc66e920             ; ImmutableOopMap{}
                            ││                                                            ;*invokevirtual consume {reexecute=0 rethrow=0 return_oop=0}
                            ││                                                            ; - codes.dbg.FindMaxBenchmark::findMax_if@39 (line 24)
                            ││                                                            ;   {optimized virtual_call}
      0.00%                 ││  0x00007f34cc66e8a8: add    rsp,0x20
      0.01%                 ││  0x00007f34cc66e8ac: pop    rbp
                            ││  0x00007f34cc66e8ad: mov    r10,QWORD PTR [r15+0x108]
                            ││  0x00007f34cc66e8b4: test   DWORD PTR [r10],eax            ;   {poll_return}
                            ││  0x00007f34cc66e8b7: ret
                            ││  0x00007f34cc66e8b8: mov    edx,r11d
                             ╰│  0x00007f34cc66e8bb: jmp    0x00007f34cc66e892
                                0x00007f34cc66e8bd: mov    edx,0x80000000
                                0x00007f34cc66e8c2: jmp    0x00007f34cc66e89a
                                 0x00007f34cc66e8c4: mov    esi,0xffffff7e
                                  0x00007f34cc66e8c9: mov    QWORD PTR [rsp],r10
                                  0x00007f34cc66e8cd: mov    DWORD PTR [rsp+0x8],r9d
    ....................................................................................................

**Observations:**
  - there is only one significant difference between `findMax_if` and `findMax_if_else`:

`0x00007f34cc66e85e: jg 0x00007f34cc66e821` vs `0x00007fc7a8671ade: jle 0x00007fc7a8671ab0`
  - `findMax_intrinsicMax` which laverage intrinsic `Math.max` has worst performance, which is counterintuitive to me.

**Questions:**
  - Is normal to add `else` statement containing code which doesn&#39;t change anything (like `x = x;`)? Especially in code which is executed on one thread.
  - Where is the real source of thourghput difference? I see the `jg` (jump if greater) is not the `jle` (jump if less or equal). Effectively the first condition is the inverted second condition.
  - What is the point of using `Math.max` if simple `if else` statement has higher throughput?

[Source code on GitHub][2]

`run_tests.sh` runs benchmark and generate plot.


  [1]: https://i.stack.imgur.com/nxuCG.png
  [2]: https://github.com/JakubBialy/javaifternarybenchmark

</details>


# 答案1
**得分**: 6

```markdown
首先,为了减少无关的ASM代码量并简化分析,让我们添加以下JVM选项

 - `-XX:LoopUnrollLimit=0` - 关闭循环展开;
 - `-XX:-UseCountedLoopSafepoints` - 从循环中消除安全点轮询。

现在,有利于`if_else`的性能差异将更大,同时结果汇编代码将会更简单。以下是两个基准测试的循环体代码。

**findMax_if**

```assembly
              0x0000029707af78f5: jmp     29707af7908h
             0x0000029707af78f7: mov     r8d,ecx
             0x0000029707af78fa: nop     word ptr [rax+rax+0h]
  0,66%   │↗  0x0000029707af7900: inc     r9d               ;*iinc {reexecute=0 rethrow=0 return_oop=0}
          ││                                                ; - codes.dbg.FindMaxBenchmark::findMax_if@31 (line 18)
  1,02%   ││  0x0000029707af7903: cmp     r9d,r10d
         │╭││  0x0000029707af7906: jnl     29707af7914h      ;*aload_3 {reexecute=0 rethrow=0 return_oop=0}
         ││││                                                ; - codes.dbg.FindMaxBenchmark::findMax_if@18 (line 19)
  2,06%  ↘│││  0x0000029707af7908: mov     ecx,dword ptr [r11+r9*4+10h]
          │││                                                ;*iaload {reexecute=0 rethrow=0 return_oop=0}
          │││                                                ; - codes.dbg.FindMaxBenchmark::findMax_if@21 (line 19)
 50,86%   │││  0x0000029707af790d: cmp     ecx,r8d
  0,02%   │╰│  0x0000029707af7910: jnle    29707af78f7h      ;*if_icmple {reexecute=0 rethrow=0 return_oop=0}
                                                           ; - codes.dbg.FindMaxBenchmark::findMax_if@23 (line 19)
 41,01%      0x0000029707af7912: jmp     29707af7900h      ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
                                                            ; - codes.dbg.FindMaxBenchmark::findMax_if@15 (line 18)
              0x0000029707af7914: test    rbx,rbx

findMax_if_else

         ╭     0x00000137d24d4b75: jmp     137d24d4b88h
│  ↗  0x00000137d24d4b77: mov     r8d,ecx
│  │  0x00000137d24d4b7a: nop     word ptr [rax+rax+0h]
72,63%  │ ↗│  0x00000137d24d4b80: inc     r9d               ;*iinc {reexecute=0 rethrow=0 return_oop=0}
│ ││                                                ; - codes.dbg.FindMaxBenchmark::findMax_if_else@36 (line 33)
0,05%  │ ││  0x00000137d24d4b83: cmp     r9d,r10d
0,01%  │╭││  0x00000137d24d4b86: jnl     137d24d4b94h      ;*aload_3 {reexecute=0 rethrow=0 return_oop=0}
││││                                                ; - codes.dbg.FindMaxBenchmark::findMax_if_else@18 (line 34)
6,47%  ↘│││  0x00000137d24d4b88: mov     ecx,dword ptr [r11+r9*4+10h]
│││                                                ;*iaload {reexecute=0 rethrow=0 return_oop=0}
│││                                                ; - codes.dbg.FindMaxBenchmark::findMax_if_else@21 (line 34)
15,93%   │││  0x00000137d24d4b8d: cmp     ecx,r8d
0,18%   │╰│  0x00000137d24d4b90: jle     137d24d4b80h      ;*if_icmple {reexecute=0 rethrow=0 return_oop=0}
│ │                                                ; - codes.dbg.FindMaxBenchmark::findMax_if_else@23 (line 34)
0,01%   │ ╰  0x00000137d24d4b92: jmp     137d24d4b77h      ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
│                                                  ; - codes.dbg.FindMaxBenchmark::findMax_if_else@15 (line 33)
↘    0x00000137d24d4b94: test    rbx,rbx

这与您的发现一致:两个编译之间唯一的区别是反转的跳转条件:jnle 对比 jle。为什么jnle变体会更慢呢?

如果我们仔细观察基准测试代码,我们会意识到当前最大值更改的点很少发生。平均而言,data[i] > result只在整个循环中真正为真的情况下出现大约14次。这意味着jnle分支仅占用0.001%的时间,其余99.999%的时间执行通过下一个jmp指令进行。

相反,第二个变体中的jle指令在99.999%的时间内被使用,执行几乎不会达到后续的jmp指令。因此,第一个循环每次迭代执行7条指令,

英文:

First, in order to minimize the amount of irrelevant ASM code and to simplify analysis, let's add the following JVM options:

  • -XX:LoopUnrollLimit=0 - turns off loop unrolling;
  • -XX:-UseCountedLoopSafepoints - eliminates safepoint polling from the loop.

Now the performance difference in favor of if_else will be even bigger, while the result assembly will be much simpler. Here is the loop body of both benchmarks.

findMax_if

         ╭     0x0000029707af78f5: jmp     29707af7908h
│ ↗   0x0000029707af78f7: mov     r8d,ecx
│ │   0x0000029707af78fa: nop     word ptr [rax+rax+0h]
0,66%  │ │↗  0x0000029707af7900: inc     r9d               ;*iinc {reexecute=0 rethrow=0 return_oop=0}
│ ││                                                ; - codes.dbg.FindMaxBenchmark::findMax_if@31 (line 18)
1,02%  │ ││  0x0000029707af7903: cmp     r9d,r10d
│╭││  0x0000029707af7906: jnl     29707af7914h      ;*aload_3 {reexecute=0 rethrow=0 return_oop=0}
││││                                                ; - codes.dbg.FindMaxBenchmark::findMax_if@18 (line 19)
2,06%  ↘│││  0x0000029707af7908: mov     ecx,dword ptr [r11+r9*4+10h]
│││                                                ;*iaload {reexecute=0 rethrow=0 return_oop=0}
│││                                                ; - codes.dbg.FindMaxBenchmark::findMax_if@21 (line 19)
50,86%   │││  0x0000029707af790d: cmp     ecx,r8d
0,02%   │╰│  0x0000029707af7910: jnle    29707af78f7h      ;*if_icmple {reexecute=0 rethrow=0 return_oop=0}
│ │                                                ; - codes.dbg.FindMaxBenchmark::findMax_if@23 (line 19)
41,01%   │ ╰  0x0000029707af7912: jmp     29707af7900h      ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
│                                                  ; - codes.dbg.FindMaxBenchmark::findMax_if@15 (line 18)
↘    0x0000029707af7914: test    rbx,rbx

findMax_if_else

         ╭     0x00000137d24d4b75: jmp     137d24d4b88h
│  ↗  0x00000137d24d4b77: mov     r8d,ecx
│  │  0x00000137d24d4b7a: nop     word ptr [rax+rax+0h]
72,63%  │ ↗│  0x00000137d24d4b80: inc     r9d               ;*iinc {reexecute=0 rethrow=0 return_oop=0}
│ ││                                                ; - codes.dbg.FindMaxBenchmark::findMax_if_else@36 (line 33)
0,05%  │ ││  0x00000137d24d4b83: cmp     r9d,r10d
0,01%  │╭││  0x00000137d24d4b86: jnl     137d24d4b94h      ;*aload_3 {reexecute=0 rethrow=0 return_oop=0}
││││                                                ; - codes.dbg.FindMaxBenchmark::findMax_if_else@18 (line 34)
6,47%  ↘│││  0x00000137d24d4b88: mov     ecx,dword ptr [r11+r9*4+10h]
│││                                                ;*iaload {reexecute=0 rethrow=0 return_oop=0}
│││                                                ; - codes.dbg.FindMaxBenchmark::findMax_if_else@21 (line 34)
15,93%   │││  0x00000137d24d4b8d: cmp     ecx,r8d
0,18%   │╰│  0x00000137d24d4b90: jle     137d24d4b80h      ;*if_icmple {reexecute=0 rethrow=0 return_oop=0}
│ │                                                ; - codes.dbg.FindMaxBenchmark::findMax_if_else@23 (line 34)
0,01%   │ ╰  0x00000137d24d4b92: jmp     137d24d4b77h      ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
│                                                  ; - codes.dbg.FindMaxBenchmark::findMax_if_else@15 (line 33)
↘    0x00000137d24d4b94: test    rbx,rbx

This aligns with your findings: the only difference between two compilations is the inverted jump condition: jnle vs. jle. Why jnle variant is slower then?

If we carefully look at the benchmark code, we'll realize that the point where the current maximum changes, happens quite seldom. On average, data[i] &gt; result is true only 14 times per the entire loop. This means, jnle branch is taken only 0.001% time, and the rest 99.999% time the execution goes through the next jmp instruction.

On the contrary, jle instruction in the second variant is taken 99.999% time, and the execution almost never reaches the following jmp. So, the first loop retires 7 instructions per iteration, while the second one - only 6 instructions.

JMH has built-in perfnorm profiler (available on Linux) that supplements benchmark results with CPU performance counters stats. Let's run it with -prof perfnorm.

Benchmark                                                Mode  Cnt        Score    Error  Units
FindMaxBenchmark.findMax_if                             thrpt   10     1447.576 &#177;  8.854  ops/s
FindMaxBenchmark.findMax_if:CPI                         thrpt             0.335            #/op
FindMaxBenchmark.findMax_if:L1-dcache-load-misses       thrpt         63971.361            #/op
FindMaxBenchmark.findMax_if:L1-dcache-loads             thrpt       1014974.522            #/op
FindMaxBenchmark.findMax_if:L1-dcache-stores            thrpt          6105.121            #/op
FindMaxBenchmark.findMax_if:L1-icache-load-misses       thrpt          1641.074            #/op
FindMaxBenchmark.findMax_if:branch-misses               thrpt           146.305            #/op
FindMaxBenchmark.findMax_if:branches                    thrpt       3006620.048            #/op
FindMaxBenchmark.findMax_if:cycles                      thrpt       2358093.526            #/op
FindMaxBenchmark.findMax_if:dTLB-load-misses            thrpt          1085.740            #/op
FindMaxBenchmark.findMax_if:dTLB-loads                  thrpt       1012739.362            #/op
FindMaxBenchmark.findMax_if:dTLB-store-misses           thrpt            21.985            #/op
FindMaxBenchmark.findMax_if:dTLB-stores                 thrpt          6146.243            #/op
FindMaxBenchmark.findMax_if:iTLB-load-misses            thrpt           139.741            #/op
FindMaxBenchmark.findMax_if:iTLB-loads                  thrpt            42.031            #/op
FindMaxBenchmark.findMax_if:instructions                thrpt       7039394.622            #/op
FindMaxBenchmark.findMax_if_else                        thrpt   10     2472.400 &#177; 36.958  ops/s
FindMaxBenchmark.findMax_if_else:CPI                    thrpt             0.229            #/op
FindMaxBenchmark.findMax_if_else:L1-dcache-load-misses  thrpt         63353.481            #/op
FindMaxBenchmark.findMax_if_else:L1-dcache-loads        thrpt       1007856.753            #/op
FindMaxBenchmark.findMax_if_else:L1-dcache-stores       thrpt          3696.805            #/op
FindMaxBenchmark.findMax_if_else:L1-icache-load-misses  thrpt          1182.253            #/op
FindMaxBenchmark.findMax_if_else:branch-misses          thrpt            72.334            #/op
FindMaxBenchmark.findMax_if_else:branches               thrpt       2000460.845            #/op
FindMaxBenchmark.findMax_if_else:cycles                 thrpt       1380927.546            #/op
FindMaxBenchmark.findMax_if_else:dTLB-load-misses       thrpt           845.629            #/op
FindMaxBenchmark.findMax_if_else:dTLB-loads             thrpt       1006135.685            #/op
FindMaxBenchmark.findMax_if_else:dTLB-store-misses      thrpt            13.336            #/op
FindMaxBenchmark.findMax_if_else:dTLB-stores            thrpt          3545.950            #/op
FindMaxBenchmark.findMax_if_else:iTLB-load-misses       thrpt            80.233            #/op
FindMaxBenchmark.findMax_if_else:iTLB-loads             thrpt            19.009            #/op
FindMaxBenchmark.findMax_if_else:instructions           thrpt       6018937.376            #/op

Perf counters confirm that findMax_if executes 7M instructions with 3M branches, whereas findMax_if_else executes 6M instructions with 2M branches. I guess it's clear now where the difference comes from, so what about the other questions?

> Is normal to add else statement containing code which doesn't change
> anything

I don't think so. At least because this looks counterintuitive, and makes the code harder to read and to understand. It's just a matter of luck that the redundant code inverted the branch condition in a good way. Replace your random array with a sorted one, so that data[i] &gt; result will be mostly true, and then findMax_if will become the fastest option.

> What is the point of using Math.max if simple if else statement has
> higher throughput?

Again, this is not always true. This highly depends on the nature of the data. When the branches are easy to predict, if statement performs better. But as soon as the branch predictor starts to fail often, the performance will drop drastically. Math.max, being a JVM intrinsic method, is translated to the branchless cmov instruction, which has an advantage of the stable performance regardless of the data distribution.

Here is an example data set where Math.max greatly outperforms all other options:

public void setup() {
Random r = new Random();
this.tab = r.ints(SIZE).sorted().toArray();
for (int i = 0; i &lt; tab.length; i += ThreadLocalRandom.current().nextInt(3)) {
tab[i] = 0;
}
}

huangapple
  • 本文由 发表于 2020年9月8日 04:30:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/63783852.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定