为什么非易失性变量会在共享CPU缓存上更新?

huangapple go评论93阅读模式
英文:

Why non volatile variable is updated on CPU shared cache?

问题

flag变量不是volatile的,所以我期望在线程1中看到一个无限循环。但我不明白为什么线程1能够看到线程2对于flag变量的更新。

为什么非volatile变量会在CPU共享缓存中更新?在这里,volatile和非volatile的flag变量之间有什么区别?

static volatile boolean flag = true;

public static void main(String[] args) {

    new Thread(() -> {
        while(flag){
            System.out.println("Running Thread1");
        }
    }).start();

    new Thread(() -> {
        flag = false;
        System.out.println("flag variable is set to False");
    }).start();

}
英文:

flag variable is not volatile so I am expecting to see an infinite loop at Thread 1. But I don't get why Thread1 can see Thread2 updates on flag variable.

Why non volatile variable is updated on CPU shared cache? Is there a difference between volatile and non volatile flag variable here?

static boolean flag = true;

public static void main(String[] args) {

    new Thread(() -> {
        while(flag){
            System.out.println("Running Thread1");
        }
    }).start();

    new Thread(() -> {
        flag = false;
        System.out.println("flag variable is set to False");
    }).start();

}

答案1

得分: 5

这里有零保证这样一个简单的程序会显示可察觉的结果。我的意思是,甚至不能保证哪个线程会首先启动,至少是这样的。

但总的来说,可见性效果只由 Java 语言规范 提供保证,它仔细构建了所谓的“先于发生关系”。这是你唯一拥有的保证,确切地说:

对 volatile 字段的写先于对该字段的每次后续读取。

没有 volatile,安全网就消失了。你可能会说 - “但我无法重现”。对此的回答将是:

  • ... 在这次运行中

  • ... 在这个平台上

  • ... 使用这个编译器

  • ... 在这个 CPU 上

等等。


事实上,你在其中添加了 System.out.println(它内部将具有 synchronized 部分),只会加剧问题;从某种意义上说,它会减少 更多机会 让某个线程永远运行下去。


我花了一些时间,但我 认为 我可以举出一个示例来证明这可以被打破。为此,你需要一个适当的工具:专为这些情况设计的工具

@JCStressTest
@State
@Outcome(id = "0", expect = Expect.ACCEPTABLE)
@Outcome(id = "3", expect = Expect.ACCEPTABLE_INTERESTING, desc = "racy read!!!")
@Outcome(id = "4", expect = Expect.ACCEPTABLE, desc = "reader thread sees everything that writer did")
public class NoVolatile {

    private int y = 1;
    private int x = 1;

    @Actor
    public void writerThread() {
        y = 2;
        x = 2;
    }

    @Actor
    public void readerThread(I_Result result) {
        if(x == 2) {
            int local = y;
            result.r1 = x + local;
        }
    }
}

你不需要理解代码(尽管这会有所帮助),但总的来说,它构建了两个“actor”或两个线程,分别改变两个独立的值:xy。有趣的部分是:

if(x == 2) {
     int local = y;
     result.r1 = x + local;
}

如果 x == 2,我们进入了那个 if 分支,result.r1 应该始终是 4,对吗?如果 result.r13,这意味着什么?

这将意味着x == 2肯定成立(否则根本不会写入 r1,因此 result.r1 将为零),这也意味着 y == 1

这意味着 ThreadA(或 writerThread)执行了一次 写入(我们确定 x == 2,因此 y 也应该是 2),但 ThreadBreaderThread)没有观察到 y2;它仍然将 y 视为 1

这些是由 @Outcome(...) 定义的情况,显然我关心的是 3。如果我运行这个(由你来弄清楚如何),我将会看到 ACCEPTABLE_INTERESTING 的情况确实出现在输出中。

如果我做一个单一的更改:

private volatile int x = 1;

通过添加 volatile,我开始遵循 JLS 规范。具体来说,来自该链接的 3 个要点:

如果 x 和 y 是同一线程的操作,并且 x 在程序顺序中位于 y 之前,则 hb(x, y)。

对 volatile 字段的写入先于对该字段的每次后续读取。

如果 hb(x, y) 和 hb(y, z),则 hb(x, z)。

这意味着如果我看到 x == 2,我 必须还会 看到 y == 2(不像 没有 volatile 那样)。现在如果我运行这个示例,3 将不会是结果的一部分。


这应该证明了一个 non-volatile 读取可能是有竞争的,因此可能丢失,而一个 volatile 读取 - 不能被丢失。

英文:

There are zero guarantees that such a simple program will show a perceivable result. I mean, it's not even guaranteed which thread will start first, at least.

But in general, the visibility effects are only guaranteed by the java language specification, that carefully builds a so-called "happens-before relationship". This is the only guarantee that you have, and that exactly says:

> A write to a volatile field happens-before every subsequent read of that field.

without volatile, the safe-net is gone. And you might say - "but I can't reproduce". The answer to that would be:

  • ... on this run

  • ... on this platform

  • ... with this compiler

  • ... on this CPU

and so on.


The fact that you add a System.out.println in there (which internally will have a synchronized part), only aggravates things; in the sense that it takes away more chances to have that one thread run forever.


It took me a while, but I think I can come up with an example on how to prove that this can break. For that, you need a proper tool: designed for these kind of things

@JCStressTest
@State
@Outcome(id = "0", expect = Expect.ACCEPTABLE)
@Outcome(id = "3", expect = Expect.ACCEPTABLE_INTERESTING, desc = "racy read!!!")
@Outcome(id = "4", expect = Expect.ACCEPTABLE, desc = "reader thread sees everything that writer did")
public class NoVolatile {

    private int y = 1;
    private int x = 1;

    @Actor
    public void writerThread() {
        y = 2;
        x = 2;
    }

    @Actor
    public void readerThread(I_Result result) {
        if(x == 2) {
            int local = y;
            result.r1 = x + local;
        }
    }
}

You do not need to understand the code (though this would help), but overall it builds two "actors" or two threads that change two independent values : x and y. The interesting part is this:

if(x == 2) {
     int local = y;
     result.r1 = x + local;
}

if x == 2, we enter that if branch and result.r1 should always 4, right? What if result.r1 is 3, what does this mean?

This would mean that x == 2 for sure (otherwise there would be no write to r1 at all and as such result.r1 would be zero) and it would mean that y == 1.

That would mean that ThreadA (or writerThread) has performed a write (we know for sure that x == 2 and as such y should also be 2), but ThreadB (readerThread) did not observe that y is 2; it has still seen y as being 1.

And these are the cases defined by that @Outcome(....), obviously the one I care about is that 3. If I run this (up to you to figure out how), I will see that ACCEPTABLE_INTERESTING case is indeed present in the output.

If I make a single change:

 private volatile int x = 1;

by adding volatile, I start to follow the JLS specification. Specifically 3 points from that link:

> If x and y are actions of the same thread and x comes before y in program order, then hb(x, y).

> A write to a volatile field happens-before every subsequent read of that field.

> If hb(x, y) and hb(y, z), then hb(x, z).

This would mean that if I see that x == 2, I must also see that y == 2 (unlike without volatile). If I run now the example, 3 is not going to be part of the result.


This should prove that a non-volatile read can be racy and as such lost, while a volatile one - can't be lost.

答案2

得分: 4

在X86架构中,缓存始终保持一致性。因此,如果CPU1对地址A执行存储操作,而CPU2拥有包含A的缓存行,则在存储可以提交到CPU1的L1D缓存之前,CPU2上的缓存行将被使无效。因此,如果在缓存行失效后CPU2想要加载A,它将遇到一次一致性缺失,并且首先需要将缓存行获取到共享或排他状态,然后才能读取A。因此,它将看到A的最新值。

因此,volatile加载和存储对于缓存的一致性没有影响。在X86上,不会出现在CPU2上加载A的旧值的情况,即使在CPU1将A提交到L1D缓存之后。

volatile的主要目的是防止与其他地址的加载和存储重新排序。在X86上,几乎所有的重新排序都是被禁止的;只有旧的存储可以与不同地址的新加载重新排序,这是由于存储缓冲造成的。防止这种情况发生的最明智方法是在写操作之后添加[StoreLoad]屏障。

有关更多信息,请参见:
https://shipilev.net/blog/2014/on-the-fence-with-dependencies/

在JVM上,通常使用"lock addl %(rsp),0"来实现这一点,意思是将0添加到堆栈指针。但是MFENCE同样有效。在硬件层面上会发生的情况是,加载操作的执行将被停止,直到存储缓冲被排空;因此,旧的存储需要在新的加载变得全局可见(将它们的内容存储在L1D缓存中)之前变得全局可见(从L1D缓存加载其内容),因此防止了旧的存储与新的加载之间的重新排序。

附:尤金上面说的是完全有效的。最好从Java内存模型开始,这是对任何硬件的抽象(因此没有缓存)。除了CPU内存屏障之外,还有编译器屏障;因此,我上面的说明只是提供了一个高层次的概述,说明了在硬件上发生的情况。对于硬件层面的情况有一些了解是非常有洞察力的。

英文:

On the X86, caches are always coherent. So if CPU1 executes a store on address A and CPU2 has the cache line containing A, the cache line is invalidated on CPU2 before the store can commit to the the L1D on CPU1. So if CPU2 wants to load A after the cache line has been invalidated, it will run into a coherence miss and first needs to get the cache line in e.g. shared or exclusive state before it can read A. And as consequence it will see the newest value of A.

So volatile loads and stores have no influence on caches being coherent. On the X86 it will not happen that on CPU2 the old value is loaded for A, after CPU1 committed A to the L1D.

The primary purpose of volatile is to prevent reordering with respect to other loads and stores to other addresses. On the X86 almost all reorderings are prohibited; only older stores can be reordered with newer loads to a different address due to store buffering. The most sensible way to prevent this is by adding a [StoreLoad] barrier after the write.

For more info see:
https://shipilev.net/blog/2014/on-the-fence-with-dependencies/

On the JVM this is typically implemented using a 'lock addl %(rsp),0'; meaning that 0 is added to the stack pointer. But an MFENCE would be equally valid. What happens on a hardware level is that the execution of loads is stopped till the store buffer has been drained; hence older stores needs to become globally visible (store their content in the L1D) before newer loads can become globally visible (load their content from the L1D) and as a consequence the reordering between older stores and newer loads is prevented.

PS: What Eugene said above is completely valid. And it is best to start with the Java Memory Model which is an abstraction from any hardware (so no caches). Apart from CPU memory barriers, there are also compiler barriers; so my story above only provides a high level overview what happens on the hardware. I find it very insightful to have some clue about what happens on the hardware level.

huangapple
  • 本文由 发表于 2020年9月14日 03:42:06
  • 转载请务必保留本文链接:https://go.coder-hub.com/63874879.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定