英文:
LinuxPerfAsmProfiler shows Java code corresponding assembly hot spot for Java 8, but not for Java 14
问题
I've translated the text you provided into English:
"When investigating an issue related to the instantiation of Spring's org.springframework.util.ConcurrentReferenceHashMap
(as of spring-core-5.1.3.RELEASE
), I've used LinuxPerfAsmProfiler
shipped along with JMH to profile generated assembly.
I simply run this:
@Benchmark
public Object measureInit() {
return new ConcurrentReferenceHashMap<>();
}
Benchmarking on JDK 8 allows us to identify one of the non-obvious hot spots:
0.61% 0x00007f32d92772ea: lock addl $0x0,(%rsp) ;*putfield count
; - org.springframework.util.ConcurrentReferenceHashMap$Segment::<init>@11 (line 476)
; - org.springframework.util.ConcurrentReferenceHashMap::<init>@141 (line 184)
15.81% 0x00007f32d92772ef: mov 0x60(%r15),%rdx
This corresponds to unnecessary assignment of the default value to a volatile field:
protected final class Segment extends ReentrantLock {
private volatile int count = 0;
}
And Segment
is, in turn, instantiated in a loop in the constructor of CCRHM
:
public ConcurrentReferenceHashMap(
int initialCapacity, float loadFactor, int concurrencyLevel, ReferenceType referenceType) {
this.loadFactor = loadFactor;
this.shift = calculateShift(concurrencyLevel, MAXIMUM_CONCURRENCY_LEVEL);
int size = 1 << this.shift;
this.referenceType = referenceType;
int roundedUpSegmentCapacity = (int) ((initialCapacity + size - 1L) / size);
this.segments = (Segment[]) Array.newInstance(Segment.class, size);
for (int i = 0; i < this.segments.length; i++) {
this.segments[i] = new Segment(roundedUpSegmentCapacity);
}
}
So the instruction is likely to be really hot. The full layout of assembly can be found in my gist.
Then I run the same benchmark on JDK 14 and again use LinuxPerfAsmProfiler
, but now I don't have any explicit pointing to volatile int count = 0
in captured assembly.
Looking for lock addl $0x0
instruction, which is the assignment of 0
under the lock
prefix, I have found this:
0.08% │ 0x00007f3717d46187: lock addl $0x0,-0x40(%rsp)
23.74% │ 0x00007f3717d4618d: mov 0x120(%r15),%rbx
which is likely to correspond to volatile int count = 0
because it follows the constructor call of Segment
's superclass ReentrantLock
:
0.77% │ 0x00007f3717d46140: movq $0x0,0x18(%rax) ;*new {reexecute=0 rethrow=0 return_oop=0}
│ ; - java.util.concurrent.locks.ReentrantLock::<init>@5 (line 294)
│ ; - org.springframework.util.ConcurrentReferenceHashMap$Segment::<init>@6 (line 484)
│ ; - org.springframework.util.ConcurrentReferenceHashMap::<init>@141 (line 184)
0.06% │ 0x00007f3717d46148: mov %r8,%rcx
0.05% │ 0x00007f3717d4614b: mov %rax,%rbx
0.03% │ 0x00007f3717d4614e: shr $0x3,%rbx
0.74% │ 0x00007f3717d46152: mov %ebx,0xc(%r8)
0.06% │ 0x00007f3717d46156: mov %rax,%rbx
0.05% │ 0x00007f3717d46159: xor %rcx,%rbx
0.02% │ 0x00007f3717d4615c: shr $0x14,%rbx
0.72% │ 0x00007f3717d46160: test %rbx,%rbx
╭ │ 0x00007f3717d46163: je 0x00007f3717d4617f
│ │ 0x00007f3717d46165: shr $0x9,%rcx
│ │ 0x00007f3717d46169: movabs $0x7f370a872000,%rdi
│ │ 0x00007f3717d46173: add %rcx,%rdi
│ │ 0x00007f3717d46176: cmpb $0x8,(%rdi)
0.00% │ │ 0x00007f3717d46179: jne 0x00007f3717d46509
0.04% ↘ │ 0x00007f3717d4617f: movl $0x0,0x14(%r8)
0.08% │ 0x00007f3717d46187: lock addl $0x0,-0x40(%rsp)
23.74% │ 0x00007f3717d4618d: mov 0x120(%r15),%rbx
The problem is that I don't have any mention of putfield count
in the generated assembly at all.
Could anyone explain why I don't see it?"
英文:
When investigating an issue related to instantiation of Spring's org.springframework.util.ConcurrentReferenceHashMap
(as of spring-core-5.1.3.RELEASE
) I've used LinuxPerfAsmProfiler
shipped along with JMH to profile generated assembly.
I simply run this
@Benchmark
public Object measureInit() {
return new ConcurrentReferenceHashMap<>();
}
Benchmarking on JDK 8 allows to identify one of non-obvious hot spots:
0.61% 0x00007f32d92772ea: lock addl $0x0,(%rsp) ;*putfield count
; - org.springframework.util.ConcurrentReferenceHashMap$Segment::&lt;init&gt;@11 (line 476)
; - org.springframework.util.ConcurrentReferenceHashMap::&lt;init&gt;@141 (line 184)
15.81% 0x00007f32d92772ef: mov 0x60(%r15),%rdx
This corresponds unnecessary assignment of default value to a volatile field:
protected final class Segment extends ReentrantLock {
private volatile int count = 0;
}
and Segment
is in turn instantiated in loop in constructor of CCRHM
:
public ConcurrentReferenceHashMap(
int initialCapacity, float loadFactor, int concurrencyLevel, ReferenceType referenceType) {
this.loadFactor = loadFactor;
this.shift = calculateShift(concurrencyLevel, MAXIMUM_CONCURRENCY_LEVEL);
int size = 1 << this.shift;
this.referenceType = referenceType;
int roundedUpSegmentCapacity = (int) ((initialCapacity + size - 1L) / size);
this.segments = (Segment[]) Array.newInstance(Segment.class, size);
for (int i = 0; i < this.segments.length; i++) {
this.segments[i] = new Segment(roundedUpSegmentCapacity);
}
}
So the instruction is likely to be really hot. Full layout of assembly can be found in my gist
Then I run the same benchmark on JDK 14 and again use LinuxPerfAsmProfiler
, but now I don't have any explicit pointing to volatile int count = 0
in captured assembly.
Looking for lock addl $0x0
instruction which is assignment of 0
under lock
prefix I have found this:
0.08% │ 0x00007f3717d46187: lock addl $0x0,-0x40(%rsp)
23.74% │ 0x00007f3717d4618d: mov 0x120(%r15),%rbx
which is likely to correspond volatile int count = 0
because it follows constructor call of Segment
's superclass ReentrantLock
:
0.77% │ 0x00007f3717d46140: movq $0x0,0x18(%rax) ;*new {reexecute=0 rethrow=0 return_oop=0}
│ ; - java.util.concurrent.locks.ReentrantLock::&lt;init&gt;@5 (line 294)
│ ; - org.springframework.util.ConcurrentReferenceHashMap$Segment::&lt;init&gt;@6 (line 484)
│ ; - org.springframework.util.ConcurrentReferenceHashMap::&lt;init&gt;@141 (line 184)
0.06% │ 0x00007f3717d46148: mov %r8,%rcx
0.05% │ 0x00007f3717d4614b: mov %rax,%rbx
0.03% │ 0x00007f3717d4614e: shr $0x3,%rbx
0.74% │ 0x00007f3717d46152: mov %ebx,0xc(%r8)
0.06% │ 0x00007f3717d46156: mov %rax,%rbx
0.05% │ 0x00007f3717d46159: xor %rcx,%rbx
0.02% │ 0x00007f3717d4615c: shr $0x14,%rbx
0.72% │ 0x00007f3717d46160: test %rbx,%rbx
╭ │ 0x00007f3717d46163: je 0x00007f3717d4617f
│ │ 0x00007f3717d46165: shr $0x9,%rcx
│ │ 0x00007f3717d46169: movabs $0x7f370a872000,%rdi
│ │ 0x00007f3717d46173: add %rcx,%rdi
│ │ 0x00007f3717d46176: cmpb $0x8,(%rdi)
0.00% │ │ 0x00007f3717d46179: jne 0x00007f3717d46509
0.04% ↘ │ 0x00007f3717d4617f: movl $0x0,0x14(%r8)
0.08% │ 0x00007f3717d46187: lock addl $0x0,-0x40(%rsp)
23.74% │ 0x00007f3717d4618d: mov 0x120(%r15),%rbx
The problem is that I don't have any mention of putfield count
in generated assembly at all.
Could anyone explain why I don't see it?
答案1
得分: 1
结果证明,你不能将为JDK 8构建的hsdis用于JDK 11等。要实现完美匹配,你需要从JDK源代码构建hsdis,然后构建JDK本身,并在这个临时构建上运行应用程序。
当我调查https://stackoverflow.com/questions/70272651/missing-bounds-checking-elimination-in-string-constructor/70296859时,这种方法对我非常有效。
英文:
It turned out that you couldn't use hsdis built for e.g. JDK 8 with JDK 11. For the perfect match you need to build hsdis from JDK sources, then build the JDK itself and run the application on this ad-hoc build.
This approach worked perfectly for me when I was investigating https://stackoverflow.com/questions/70272651/missing-bounds-checking-elimination-in-string-constructor/70296859.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论