英文:
Shenandoah self healing barriers
问题
自愈性屏障是什么?为什么在Shenandoah 2.0中变得如此重要?
英文:
The title pretty much says it all - what are these self healing barriers and why are they important in Shenandoah 2.0?
答案1
得分: 2
这个解释将基于我尝试放在关于 Shenandoah 2.0
的一些答案的 第一部分 和 第二部分 而展开。
要真正回答这个问题,我们需要看一下加载引用屏障(load reference barrier)
是如何实现以及GC周期
的一般行为是怎样的。
当特定的GC周期
被触发时,它首先选择具有最多垃圾的“区域”;即:在“收集集合(collection set)”中的对象非常少(这在将来会很重要)。
理解这个主题最简单的方法是通过一个示例。假设这是现在存在于某个区域的方案:
refA refB
|
---------
| 标记 |
---------
| i = 0 |
| j = 0 |
---------
有一个存在于该区域的对象,并且有两个引用指向它:refA
和refB
。GC
启动并选择了该区域进行垃圾回收。与此同时,应用程序中有活动线程尝试通过refA
和refB
访问此对象。由于此对象在某个时刻是“存活”的,因此需要将其从现有区域“疏散”到新区域(标记-整理
阶段的一部分)。
因此:GC
是“活动的”,同时,我们通过refA/refB
“读取”。当我们进行此读取时,我们会触发加载引用屏障
,其实现在这里。请注意,它在内部有一些“过滤器”(通过一堆if/else
语句实现)。具体而言:
-
它检查“是否当前正在进行疏散”。这是通过设置一个在疏散开始时设置的线程本地标志来完成的。让我们假设这个答案是:是的。
-
它检查我们当前正在操作的对象是否在“收集集合”中。这意味着它目前被标记为“存活”。让我们假设这也是“是”。
-
最后一次检查是找出该对象是否已经“复制”到另一个区域(已疏散)。让我们假设这个答案是“否”,即:
obj == fwd
。
此时,会发生一些事情。首先,会创建一个副本,标记
将成为副本。
refA refB
|
-------------- ---------
| 副本 | ---- | 标记 |
-------------- ---------
| i = 0 | | i = 0 |
| j = 0 | | j = 0 |
--------- ---------
只有在代码的稍后部分,refA
和refB
才会被更新,指向_新的_(复制的)对象。但这意味着一个有趣的事情。这意味着在refA
和refB
实际指向新对象之前,它们当前指向的对象在“收集集合”中。因此,如果GC处于活动状态,即使forwardee
已经建立,加载引用屏障
仍然需要做一些工作。
因此,“Shenandoah”的背后的非常聪明的人们说:在“forwardee”建立之后(或者其他引用已经知道“forwardee”时),为什么不立即更新引用呢?这正是他们所做的。
让我们假设我们回到最初的绘图:
refA refB
|
---------
| 标记 |
---------
| i = 0 |
| j = 0 |
---------
再次,“启用”所有过滤器:
-
有一个线程通过
refA
读取 -
GC是活动的
-
refA
和refB
后面的对象是存活的。
这是“自愈障碍”将会发生的情况:
refB refA
| |
-------------- ---------
| 副本 | ---- | 标记 |
-------------- ---------
| i = 0 | | i = 0 |
| j = 0 | | j = 0 |
--------- ---------
区别是显而易见的:refA
通过CAS
移动到了指向新对象的位置。如果再次通过refA
进行读取(GC仍然活动),这将导致更快的加载引用屏障
执行。为什么呢?因为refA
指向的对象不在“收集集合”中。
但这也意味着,如果我们通过refB
进行读取,并且发现fwd != obj
- 代码可以进行相同的操作,立即在第一次通过refB
读取时更新refB
。
根据了解此事的人士,这将提高性能,我对他们是信任的。
英文:
This explanation will piggy-back on the first part and the second part of some answers I tried to put around Shenandoah 2.0
.
To really answer this question we need to look at how the load reference barrier
is implemented and how a GC cycle
acts, in general.
When a certain GC cycle
is triggered, it first chooses the regions with the most garbage; i.e.: objects that are in the collection set are very few (this will matter in the future).
The simplest way to understand this topic is via an example. Suppose this is a scheme that now exists in a certain region:
refA refB
|
---------
| mark |
---------
| i = 0 |
| j = 0 |
---------
There is an object that exists in the region and there are two references pointing to it : refA
and refB
. GC
kicks in and this region is chosen to be garbage collected. At the same time there are active threads in the application that try to access this Object via refA
and refB
. Since this object is alive
at some point it needs to be evacuated to a new region (part of the mark-compact
phase).
So: GC
is active and, at the same time, we read via refA/refB
. When we do this reading we step on the load-reference-barrier
, implemented here. Notice how internally it has some "filters" (via a bunch of if/else
statements). Specifically:
-
it checks if "evacuation is currently in progress". This is done via a thread local flag that is set when evacuation first starts. Let's suppose the answer to this is : yes.
-
it checks if the object that we are currently operating on is in the "collection-set". This means it is currently marked as alive. Let's suppose this is "yes" also.
-
the last check is to find out if this object was already "copied" to a different region (it was evacuated). Let's suppose the answer to this is "no", i.e. :
obj == fwd
.
At this point in time, a few things happen. First a copy is created and mark
becomes forwardee
refA refB
|
-------------- ---------
| forwardee | ---- | mark |
-------------- ---------
| i = 0 | | i = 0 |
| j = 0 | | j = 0 |
--------- ---------
Only later in the code, would refA
and refB
be updated to point to the new (copied) object. But that means an interesting thing. It means that until refA
and refB
are actually made to point to the new object, the object that they currently point, is in the "collection set". So, if GC is active and even if the forwardee
has been established, the load-reference-barrier
still needs to do some work.
So the very smart people behind Shenandoah
said this : why not update the references there, immediately after the forwardee
has been established (or when the forwardee
is already known for other references)? And this is exactly what they did.
Let's suppose we get back to our initial drawing:
refA refB
|
---------
| mark |
---------
| i = 0 |
| j = 0 |
---------
And again, we "enable" all of the filter:
-
there is a Thread that reads via
refA
-
GC is active
-
the object behind
refA
andrefB
is alive.
This is what will happen with "self healing barriers":
refB refA
| |
-------------- ---------
| forwardee | ---- | mark |
-------------- ---------
| i = 0 | | i = 0 |
| j = 0 | | j = 0 |
--------- ---------
The difference is obvious: refA
was moved to point to the new Object via CAS
, on the spot. If there is going to be a read again via refA
(GC is still active), this will result in a much faster load-reference-barrier execution. Why? because refA
points to an object that is not in the "collection set".
But this also means that if we read via refB
and see that fwd != obj
- the code can do the same trick and update the refB
in place, at the time the first read happened via refB
.
This improves performance according to the people familiar with the matter, and I trust them.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论