Shenandoah 2.0 消除转发指针

huangapple go评论71阅读模式
英文:

Shenandoah 2.0 elimination of forwarding pointer

问题

Shenandoah 1.0 中,每个对象都有一个额外的头部,称为 forwarding pointer(转发指针)。为什么需要这个头部,以及为什么在 Shenandoah 2.0 中被删除的原因是什么?

英文:

In Shenandoah 1.0 every single Object had an additional header - called forwarding pointer. Why was that needed and what is the reason that lead to its elimination in Shenandoah 2.0?

答案1

得分: 6

首先,每个 Java 对象都有两个头部:klassmark。从一开始,每个实例中都存在这两个头部(随着最近的 JVM,例如,它们的标志在 JVM 内部处理上可能会稍微改变),它们出于各种原因而存在(在回答中只会详细介绍其中一个原因)。

"转发指针" 的需求实际上在这个答案的第二部分中有详细说明。"转发指针" 在 Shenandoah 1.0 中的 读屏障写屏障 都需要(虽然对于某些字段类型,读操作 可以跳过屏障 - 将不会详细介绍)。用非常简单的话来说,它极大地简化了 并发复制。正如在那个答案中所说,它允许将 "转发指针" 原子地切换到对象的新副本,然后 并发地 更新所有引用,使其指向该新对象。

Shenandoah 2.0 中有一些变化,其中"到空间不变式" 已经生效:这意味着所有的写入和读取都通过 to-space 进行。这意味着一个有趣的事情:一旦建立了 to-space 的副本,就不再使用 from-copy。想象这样的情况:

refA            refB
  |               |
fwdPointer1 ---- fwdPointer2        
                  |
  ---------       ---------  
  | i = 0 |       | i = 0 | 
  | j = 0 |       | j = 0 | 
  ---------       ---------

Shenandoah 1.0 中,有时可以通过 refA 进行 读取 而绕过屏障(根本不使用它),仍然会通过 from-copy 进行 读取。例如,对于 final 字段就是这样(通过特殊标志)。这意味着,即使 to-space 副本已经存在,并且已经有了对它的引用,仍然可能会有通过 refA 进行的 读取,这些读取会进入 from-space 副本。在 Shenandoah 2.0 中,这是被禁止的。

这个信息被以相当有趣的方式使用。Java 中的每个对象都对齐到 64 位 - 这意味着最后的 3 位永远是零。因此,他们放弃了 "转发指针",并表示:如果 "mark" 字的最后两位是 11(由于其他情况都不使用这种方式,所以是允许的) -> 这是一个 "转发指针",否则 "to-space" 的副本还不存在,这是一个普通的头部。您可以在这里看到实际效果,并且可以在这里这里追踪掩码的用法。

它以前看起来像这样:

| -------------------|
| forwarding Pointer |
| -------------------|

| -------------------|
|        mark        |
| -------------------|

| -------------------|
|        class       |
| -------------------|

然后转变为:

| -------------------|
| mark or forwarding |     // 取决于最后两位
| -------------------|

| -------------------|
|        class       |
| -------------------|

所以这里有一个可能的情景(为了简单起见,我会跳过 "class header"):

refA, refB            
     |               
    mark   (最后两位为 00)   
     |              
  ---------   
  | i = 0 |      
  | j = 0 |      
  ---------

GC 开始工作。由 refA/refB 引用的对象是活的,因此必须被转移(它被称为 "collection set" 中的对象)。首先创建一个副本,并将 mark 原子地设置为引用该副本(同时最后两位被标记为 11,以将其标记为 forwardee 而不是 mark word):

refA, refB            
     |               
   mark (11) ------  mark (00)   
                           |
  ---------          ---------
  | i = 0 |          | i = 0 |
  | j = 0 |          | j = 0 |
  ---------          ---------

现在,一个 "mark word" 的位模式(以 11 结尾)表明它是一个 forwardee 而不再是一个标记字。

refA              refB            
     |                 |               
 mark (11) ------  mark (00)   
                           |
  ---------          ---------
  | i = 0 |          | i = 0 |
  | j = 0 |          | j = 0 |
  ---------          ---------

refB 可以并发地移动,然后是 refA,最终没有引用指向 from-space 的对象,它就变成了垃圾。这就是在需要时 "mark word" 如何充当 "转发指针"。

英文:

First of all, every single java Object has two headers: klass and mark. They have been there in each instance since forever (they can slightly change how a JVM handles their flags internally with recent JVMs, for example) and are used for various reasons (will go into detail about only one of them a bit further in the answer).

The need for a forwarding pointer is literally in the second part of this answer. The forwarding pointer is needed in both read barrier and write barrier in Shenandoah 1.0 (though the read could skip the barrier for some field types - will not go into detail). In very simple words it simplifies concurrent copy very much. As said in that answer, it allows to atomically switch the forwarding pointer to the new copy of the Object and then concurrently update all references to point to that new Object.

Things have changed a bit in Shenandoah 2.0 where the "to-space invariant" is in place : meaning all the writes and reads are done via the to-space.This means one interesting thing : once the to-space copy is established, the from-copy is never used. Imagine a situation like this:

    refA            refB
      |               |
fwdPointer1 ---- fwdPointer2        
                      |
  ---------       ---------  
  | i = 0 |       | i = 0 | 
  | j = 0 |       | j = 0 | 
  ---------       ---------

In Shenandoah 1.0 there were cases when reading via the refA could bypass the barrier (not use it at all) and still read via the from-copy. This was allowed for final fields, for example (via a special flag). This means that even if to-space copy already existed and there were already references to it, there could still be reads (via refA) that would go to the from-space copy. In Shenandoah 2.0 this is prohibited.

This information was used in a rather interesting way. Every object in Java is aligned to 64 bits - meaning the last 3 bits are always zero. So, they dropped the forwarding pointer and said that : if the last two bits of the mark word are 11 (this is allowed since no else uses it in this manner) -> this is a forwarding pointer, otherwise the to-space copy does yet exists and this is a plain header. You can see it in action right here and you can trace the masking here and here.

It used to look like this:

| -------------------|
| forwarding Pointer |
| -------------------|

| -------------------|
|        mark        |
| -------------------|

| -------------------|
|        class       |
| -------------------|

And has transformed to:

| -------------------|
| mark or forwarding |     // depending on the last two bits
| -------------------|

| -------------------|
|        class       |
| -------------------|

So here is a possible scenario (I'll skip class header for simplicity):

  refA, refB            
       |               
      mark   (last two bits are 00)   
       |              
    ---------   
    | i = 0 |      
    | j = 0 |      
    ---------  

GC kicks in. The object referenced by refA/refB is alive, thus must be evacuated (it is said to be in the "collection set"). First a copy is created and atomically mark is made to reference that copy (also the last two bits are marked as 11 to now make it a forwardee and not a mark word):

  refA, refB            
       |               
     mark (11) ------  mark (00)   
                           |
    ---------          ---------
    | i = 0 |          | i = 0 |
    | j = 0 |          | j = 0 |
    ---------          ---------

Now one of the mark words has a bit pattern (ends in 11) that indicates that it is a forwardee and not a mark word anymore.

       refA              refB            
         |                 |               
     mark (11) ------  mark (00)   
                           |
    ---------          ---------
    | i = 0 |          | i = 0 |
    | j = 0 |          | j = 0 |
    ---------          ---------

refB can move concurrently, so then refA, ultimately there are not references to the from-space object and it is garbage. This is how mark word acts as a forwarding pointer, if needed.

huangapple
  • 本文由 发表于 2020年9月22日 10:47:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/64002388.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定