LinuxPerfAsmProfiler shows Java code corresponding assembly hot spot for Java 8, but not for Java 14

huangapple go评论89阅读模式
英文:

LinuxPerfAsmProfiler shows Java code corresponding assembly hot spot for Java 8, but not for Java 14

问题

I've translated the text you provided into English:

"When investigating an issue related to the instantiation of Spring's org.springframework.util.ConcurrentReferenceHashMap (as of spring-core-5.1.3.RELEASE), I've used LinuxPerfAsmProfiler shipped along with JMH to profile generated assembly.

I simply run this:

@Benchmark
public Object measureInit() {
  return new ConcurrentReferenceHashMap<>();
}

Benchmarking on JDK 8 allows us to identify one of the non-obvious hot spots:

  0.61%        0x00007f32d92772ea: lock addl $0x0,(%rsp)     ;*putfield count
                                                             ; - org.springframework.util.ConcurrentReferenceHashMap$Segment::<init>@11 (line 476)
                                                             ; - org.springframework.util.ConcurrentReferenceHashMap::<init>@141 (line 184)
 15.81%        0x00007f32d92772ef: mov    0x60(%r15),%rdx

This corresponds to unnecessary assignment of the default value to a volatile field:

protected final class Segment extends ReentrantLock {
  private volatile int count = 0;
}

And Segment is, in turn, instantiated in a loop in the constructor of CCRHM:

public ConcurrentReferenceHashMap(
  	int initialCapacity, float loadFactor, int concurrencyLevel, ReferenceType referenceType) {
  this.loadFactor = loadFactor;
  this.shift = calculateShift(concurrencyLevel, MAXIMUM_CONCURRENCY_LEVEL);
  int size = 1 << this.shift;
  this.referenceType = referenceType;
  int roundedUpSegmentCapacity = (int) ((initialCapacity + size - 1L) / size);
  this.segments = (Segment[]) Array.newInstance(Segment.class, size);
  for (int i = 0; i < this.segments.length; i++) {
   this.segments[i] = new Segment(roundedUpSegmentCapacity);
  }
}

So the instruction is likely to be really hot. The full layout of assembly can be found in my gist.

Then I run the same benchmark on JDK 14 and again use LinuxPerfAsmProfiler, but now I don't have any explicit pointing to volatile int count = 0 in captured assembly.

Looking for lock addl $0x0 instruction, which is the assignment of 0 under the lock prefix, I have found this:

  0.08%                            0x00007f3717d46187:   lock addl $0x0,-0x40(%rsp)
 23.74%                            0x00007f3717d4618d:   mov    0x120(%r15),%rbx

which is likely to correspond to volatile int count = 0 because it follows the constructor call of Segment's superclass ReentrantLock:

  0.77%                            0x00007f3717d46140:   movq   $0x0,0x18(%rax)              ;*new {reexecute=0 rethrow=0 return_oop=0}
                                                                                             ; - java.util.concurrent.locks.ReentrantLock::<init>@5 (line 294)
                                                                                             ; - org.springframework.util.ConcurrentReferenceHashMap$Segment::<init>@6 (line 484)
                                                                                             ; - org.springframework.util.ConcurrentReferenceHashMap::<init>@141 (line 184)
  0.06%                            0x00007f3717d46148:   mov    %r8,%rcx
  0.05%                            0x00007f3717d4614b:   mov    %rax,%rbx
  0.03%                            0x00007f3717d4614e:   shr    $0x3,%rbx
  0.74%                            0x00007f3717d46152:   mov    %ebx,0xc(%r8)
  0.06%                            0x00007f3717d46156:   mov    %rax,%rbx
  0.05%                            0x00007f3717d46159:   xor    %rcx,%rbx
  0.02%                            0x00007f3717d4615c:   shr    $0x14,%rbx
  0.72%                            0x00007f3717d46160:   test   %rbx,%rbx
                                  0x00007f3717d46163:   je     0x00007f3717d4617f
                                  0x00007f3717d46165:   shr    $0x9,%rcx
                                  0x00007f3717d46169:   movabs $0x7f370a872000,%rdi
                                  0x00007f3717d46173:   add    %rcx,%rdi
                                  0x00007f3717d46176:   cmpb   $0x8,(%rdi)
  0.00%                           0x00007f3717d46179:   jne    0x00007f3717d46509
  0.04%                           0x00007f3717d4617f:   movl   $0x0,0x14(%r8)
  0.08%                            0x00007f3717d46187:   lock addl $0x0,-0x40(%rsp)
 23.74%                            0x00007f3717d4618d:   mov    0x120(%r15),%rbx

The problem is that I don't have any mention of putfield count in the generated assembly at all.

Could anyone explain why I don't see it?"

英文:

When investigating an issue related to instantiation of Spring's org.springframework.util.ConcurrentReferenceHashMap (as of spring-core-5.1.3.RELEASE) I've used LinuxPerfAsmProfiler shipped along with JMH to profile generated assembly.

I simply run this

@Benchmark
public Object measureInit() {
  return new ConcurrentReferenceHashMap&lt;&gt;();
}

Benchmarking on JDK 8 allows to identify one of non-obvious hot spots:

  0.61%        0x00007f32d92772ea: lock addl $0x0,(%rsp)     ;*putfield count
                                                             ; - org.springframework.util.ConcurrentReferenceHashMap$Segment::&amp;lt;init&amp;gt;@11 (line 476)
                                                             ; - org.springframework.util.ConcurrentReferenceHashMap::&amp;lt;init&amp;gt;@141 (line 184)
 15.81%        0x00007f32d92772ef: mov    0x60(%r15),%rdx

This corresponds unnecessary assignment of default value to a volatile field:

protected final class Segment extends ReentrantLock {
  private volatile int count = 0;
}

and Segment is in turn instantiated in loop in constructor of CCRHM:

public ConcurrentReferenceHashMap(
  	int initialCapacity, float loadFactor, int concurrencyLevel, ReferenceType referenceType) {
  this.loadFactor = loadFactor;
  this.shift = calculateShift(concurrencyLevel, MAXIMUM_CONCURRENCY_LEVEL);
  int size = 1 &lt;&lt; this.shift;
  this.referenceType = referenceType;
  int roundedUpSegmentCapacity = (int) ((initialCapacity + size - 1L) / size);
  this.segments = (Segment[]) Array.newInstance(Segment.class, size);
  for (int i = 0; i &lt; this.segments.length; i++) {
   this.segments[i] = new Segment(roundedUpSegmentCapacity);
  }
}

So the instruction is likely to be really hot. Full layout of assembly can be found in my gist

Then I run the same benchmark on JDK 14 and again use LinuxPerfAsmProfiler, but now I don't have any explicit pointing to volatile int count = 0 in captured assembly.

Looking for lock addl $0x0 instruction which is assignment of 0 under lock prefix I have found this:

  0.08%                            0x00007f3717d46187:   lock addl $0x0,-0x40(%rsp)
 23.74%                            0x00007f3717d4618d:   mov    0x120(%r15),%rbx

which is likely to correspond volatile int count = 0 because it follows constructor call of Segment's superclass ReentrantLock:

  0.77%                            0x00007f3717d46140:   movq   $0x0,0x18(%rax)              ;*new {reexecute=0 rethrow=0 return_oop=0}
                                                                                             ; - java.util.concurrent.locks.ReentrantLock::&amp;lt;init&amp;gt;@5 (line 294)
                                                                                             ; - org.springframework.util.ConcurrentReferenceHashMap$Segment::&amp;lt;init&amp;gt;@6 (line 484)
                                                                                             ; - org.springframework.util.ConcurrentReferenceHashMap::&amp;lt;init&amp;gt;@141 (line 184)
  0.06%                            0x00007f3717d46148:   mov    %r8,%rcx
  0.05%                            0x00007f3717d4614b:   mov    %rax,%rbx
  0.03%                            0x00007f3717d4614e:   shr    $0x3,%rbx
  0.74%                            0x00007f3717d46152:   mov    %ebx,0xc(%r8)
  0.06%                            0x00007f3717d46156:   mov    %rax,%rbx
  0.05%                            0x00007f3717d46159:   xor    %rcx,%rbx
  0.02%                            0x00007f3717d4615c:   shr    $0x14,%rbx
  0.72%                            0x00007f3717d46160:   test   %rbx,%rbx
                                  0x00007f3717d46163:   je     0x00007f3717d4617f
                                  0x00007f3717d46165:   shr    $0x9,%rcx
                                  0x00007f3717d46169:   movabs $0x7f370a872000,%rdi
                                  0x00007f3717d46173:   add    %rcx,%rdi
                                  0x00007f3717d46176:   cmpb   $0x8,(%rdi)
  0.00%                           0x00007f3717d46179:   jne    0x00007f3717d46509
  0.04%                           0x00007f3717d4617f:   movl   $0x0,0x14(%r8)
  0.08%                            0x00007f3717d46187:   lock addl $0x0,-0x40(%rsp)
 23.74%                            0x00007f3717d4618d:   mov    0x120(%r15),%rbx

The problem is that I don't have any mention of putfield count in generated assembly at all.

Could anyone explain why I don't see it?

答案1

得分: 1

结果证明,你不能将为JDK 8构建的hsdis用于JDK 11等。要实现完美匹配,你需要从JDK源代码构建hsdis,然后构建JDK本身,并在这个临时构建上运行应用程序。

当我调查https://stackoverflow.com/questions/70272651/missing-bounds-checking-elimination-in-string-constructor/70296859时,这种方法对我非常有效。

英文:

It turned out that you couldn't use hsdis built for e.g. JDK 8 with JDK 11. For the perfect match you need to build hsdis from JDK sources, then build the JDK itself and run the application on this ad-hoc build.

This approach worked perfectly for me when I was investigating https://stackoverflow.com/questions/70272651/missing-bounds-checking-elimination-in-string-constructor/70296859.

huangapple
  • 本文由 发表于 2020年8月13日 22:52:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/63397711.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定