英文:
Shenandoah 2.0 elimination of forwarding pointer
问题
在 Shenandoah 1.0
中,每个对象都有一个额外的头部,称为 forwarding pointer
(转发指针)。为什么需要这个头部,以及为什么在 Shenandoah 2.0
中被删除的原因是什么?
英文:
In Shenandoah 1.0
every single Object had an additional header - called forwarding pointer
. Why was that needed and what is the reason that lead to its elimination in Shenandoah 2.0
?
答案1
得分: 6
首先,每个 Java 对象都有两个头部:klass
和 mark
。从一开始,每个实例中都存在这两个头部(随着最近的 JVM,例如,它们的标志在 JVM 内部处理上可能会稍微改变),它们出于各种原因而存在(在回答中只会详细介绍其中一个原因)。
"转发指针" 的需求实际上在这个答案的第二部分中有详细说明。"转发指针" 在 Shenandoah 1.0
中的 读屏障
和 写屏障
都需要(虽然对于某些字段类型,读操作 可以跳过屏障 - 将不会详细介绍)。用非常简单的话来说,它极大地简化了 并发复制。正如在那个答案中所说,它允许将 "转发指针" 原子地切换到对象的新副本,然后 并发地 更新所有引用,使其指向该新对象。
在 Shenandoah 2.0
中有一些变化,其中"到空间不变式" 已经生效:这意味着所有的写入和读取都通过 to-space
进行。这意味着一个有趣的事情:一旦建立了 to-space
的副本,就不再使用 from-copy
。想象这样的情况:
refA refB
| |
fwdPointer1 ---- fwdPointer2
|
--------- ---------
| i = 0 | | i = 0 |
| j = 0 | | j = 0 |
--------- ---------
在 Shenandoah 1.0
中,有时可以通过 refA
进行 读取 而绕过屏障(根本不使用它),仍然会通过 from-copy
进行 读取。例如,对于 final
字段就是这样(通过特殊标志)。这意味着,即使 to-space
副本已经存在,并且已经有了对它的引用,仍然可能会有通过 refA
进行的 读取,这些读取会进入 from-space
副本。在 Shenandoah 2.0
中,这是被禁止的。
这个信息被以相当有趣的方式使用。Java 中的每个对象都对齐到 64 位 - 这意味着最后的 3 位永远是零。因此,他们放弃了 "转发指针",并表示:如果 "mark" 字的最后两位是 11
(由于其他情况都不使用这种方式,所以是允许的) -> 这是一个 "转发指针",否则 "to-space" 的副本还不存在,这是一个普通的头部。您可以在这里看到实际效果,并且可以在这里和这里追踪掩码的用法。
它以前看起来像这样:
| -------------------|
| forwarding Pointer |
| -------------------|
| -------------------|
| mark |
| -------------------|
| -------------------|
| class |
| -------------------|
然后转变为:
| -------------------|
| mark or forwarding | // 取决于最后两位
| -------------------|
| -------------------|
| class |
| -------------------|
所以这里有一个可能的情景(为了简单起见,我会跳过 "class header"):
refA, refB
|
mark (最后两位为 00)
|
---------
| i = 0 |
| j = 0 |
---------
GC
开始工作。由 refA/refB
引用的对象是活的,因此必须被转移(它被称为 "collection set" 中的对象)。首先创建一个副本,并将 mark
原子地设置为引用该副本(同时最后两位被标记为 11
,以将其标记为 forwardee
而不是 mark word
):
refA, refB
|
mark (11) ------ mark (00)
|
--------- ---------
| i = 0 | | i = 0 |
| j = 0 | | j = 0 |
--------- ---------
现在,一个 "mark word" 的位模式(以 11
结尾)表明它是一个 forwardee 而不再是一个标记字。
refA refB
| |
mark (11) ------ mark (00)
|
--------- ---------
| i = 0 | | i = 0 |
| j = 0 | | j = 0 |
--------- ---------
refB
可以并发地移动,然后是 refA
,最终没有引用指向 from-space
的对象,它就变成了垃圾。这就是在需要时 "mark word" 如何充当 "转发指针"。
英文:
First of all, every single java Object has two headers: klass
and mark
. They have been there in each instance since forever (they can slightly change how a JVM handles their flags internally with recent JVMs, for example) and are used for various reasons (will go into detail about only one of them a bit further in the answer).
The need for a forwarding pointer
is literally in the second part of this answer. The forwarding pointer
is needed in both read barrier
and write barrier
in Shenandoah 1.0
(though the read could skip the barrier for some field types - will not go into detail). In very simple words it simplifies concurrent copy very much. As said in that answer, it allows to atomically switch the forwarding pointer
to the new copy of the Object and then concurrently update all references to point to that new Object.
Things have changed a bit in Shenandoah 2.0
where the "to-space invariant" is in place : meaning all the writes and reads are done via the to-space
.This means one interesting thing : once the to-space
copy is established, the from-copy
is never used. Imagine a situation like this:
refA refB
| |
fwdPointer1 ---- fwdPointer2
|
--------- ---------
| i = 0 | | i = 0 |
| j = 0 | | j = 0 |
--------- ---------
In Shenandoah 1.0
there were cases when reading via the refA
could bypass the barrier (not use it at all) and still read via the from-copy
. This was allowed for final
fields, for example (via a special flag). This means that even if to-space
copy already existed and there were already references to it, there could still be reads (via refA
) that would go to the from-space
copy. In Shenandoah 2.0
this is prohibited.
This information was used in a rather interesting way. Every object in Java is aligned to 64 bits - meaning the last 3 bits are always zero. So, they dropped the forwarding pointer
and said that : if the last two bits of the mark
word are 11
(this is allowed since no else uses it in this manner) -> this is a forwarding pointer
, otherwise the to-space
copy does yet exists and this is a plain header. You can see it in action right here and you can trace the masking here and here.
It used to look like this:
| -------------------|
| forwarding Pointer |
| -------------------|
| -------------------|
| mark |
| -------------------|
| -------------------|
| class |
| -------------------|
And has transformed to:
| -------------------|
| mark or forwarding | // depending on the last two bits
| -------------------|
| -------------------|
| class |
| -------------------|
So here is a possible scenario (I'll skip class header
for simplicity):
refA, refB
|
mark (last two bits are 00)
|
---------
| i = 0 |
| j = 0 |
---------
GC
kicks in. The object referenced by refA/refB
is alive, thus must be evacuated (it is said to be in the "collection set"). First a copy is created and atomically mark
is made to reference that copy (also the last two bits are marked as 11
to now make it a forwardee
and not a mark word
):
refA, refB
|
mark (11) ------ mark (00)
|
--------- ---------
| i = 0 | | i = 0 |
| j = 0 | | j = 0 |
--------- ---------
Now one of the mark word
s has a bit pattern (ends in 11
) that indicates that it is a forwardee and not a mark word anymore.
refA refB
| |
mark (11) ------ mark (00)
|
--------- ---------
| i = 0 | | i = 0 |
| j = 0 | | j = 0 |
--------- ---------
refB
can move concurrently, so then refA
, ultimately there are not references to the from-space
object and it is garbage. This is how mark word
acts as a forwarding pointer
, if needed.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论