英文:
How does thread locking work -on a high level- in the context of multiple core?
问题
我正在尝试理解多线程和线程安全。Java显然有关键字如Synchronized来实现这一点。
然而,请考虑这种情况。
假设我们的CPU有多个核心,因此可以同时执行多条指令。那么,如果多个线程在多个核心上同时并行执行获取锁的指令,那么高层次上会如何工作?
英文:
I am trying to understand multi-threading, and thread safety. Java obviously has keywords such as Synchronized to achieve that.
However, consider this scenario.
Let's say our CPU has multiple cores, so can we can execute more than one instruction at a time. So, if the instruction to obtain lock in executed -in parallel, simultaneously- on multiple cores for multiple threads, how will that work -at a high level-?
答案1
得分: 2
JVM的重点是:不关心它是如何工作的,只关心它是否工作。具体来说,JVM保证不会有两个线程同时进入一个synchronized (x) {}
块,其中x
是指向同一个Java对象的引用。
有各种硬件处理这个问题的方式。
一种是CAS操作(Compare-And-Set)。CAS操作的工作方式如下:
给定CAS(memLoc, 5, 20)
;
- 读取内存地址x处的值。__如果__它是'5',则将其更新为'20'并返回true。否则,不要更改它,并返回false。
各种CPU本地支持CAS操作。在CPU级别上,CAS操作传递给实际上本地控制该内存的东西(即使有单独的内存控制器,CAS指令也会发送到它),最终是一个小但至关重要的单系统瓶颈(没有2个核心同时执行该操作的风险,因为不是CPU核心执行此操作,而是内存控制器)。
如果CAS可用,类似这样的东西就很简单。例如,我们可以为所有对象分配一个单独的64位字段值,并且有一个单独的64位'线程ID'计数器。然后我们修改JVM中的一些基本内容:
- 每当创建一个新线程时,我们有一个循环,工作方式如下:
int threadId = -1;
while (true) {
int currentThreadId = read(globalThreadIdCounterMemLoc);
threadId = currentThreadId + 1;
if (CAS(globalThreadIdCounterMemLoc, currentThreadId, threadId)) break;
}
return createNewThreadWithId(threadId);
这确保每个线程都有一个保证唯一的ID。一旦所有线程都有唯一的ID:
- 为所有对象添加一个threadOwner字段。
- 实现synchronized,类似于以下方式:
// owner id为-1表示:未拥有。
public void jvmSynchronizedImpl(Thread owner, Object lockObj) {
while (true) {
boolean lockAcquired = CAS(lockObj.threadOwner, -1, owner.threadID);
if (lockAcquired) return true;
}
}
显然,它不会那么简单 - 等待获取锁不会“忙等待”(CPU风扇启动并且电源使用量大幅上升,因为CPU正在主动连续检查该值;如果锁不可用,系统会等待)。
即使CAS不可用,CPU核心也有LOCK指令。例如,x86/x64架构有一个'lock'修饰符;它修改任何读取-修改-写入内存的操作码。例如,INC指令(需要从位置读取值,将其增加1,然后写回)。如果操作码上有一个锁标志,CPU会确保对相关数据的独占所有权。我们只是将问题延迟到了稍后:
- JVM如何实现锁定?依赖CPU来执行。
这意味着我们现在要问:...那么,CPU是如何做到的呢?
同样的规则适用:它会做到。文档明确说明了什么是有保证的,什么是没有保证的 - 你今天购买的CPU可能会完全不同地实现规格中所规定的“协议”。
但是,通常情况下,使用类似CAS的机制在所有CPU核心都可以看到的共享寄存器上。或者,它只是锁住总线(这是如何工作的 - 哎呀,这是一层接一层的问题:总线具有一些规范,保证如果你执行X和Y,那么你会得到同步的保证,但不会明确说明它是如何做到的)。
注意:CAS有时被称为测试与设置。您可能想阅读测试与设置的维基百科页面,该页面还涵盖了硬件实现。正如我在开头所示,CAS是一个非常有用的中间解决方案:它易于理解,通常由硬件中的“最低级别的东西”易于实现,并且易于传达(仅需要3个值:位置、期望值和新值 - 通过操作返回的单个位)。有了CAS在手,编写更复杂的同步系统就变得容易了(例如Java的synchronized
,以及像AtomicInteger
的incrementAndGet()
这样的东西,它比使用synchronized
来获取一致的多核原子增量器要快得多)。因此,CAS通常是在您只能要求底层的任何东西为您提供需要进行同步承诺的工具时使用的解决方案。因此,Java代码向JVM请求CAS。JVM请求操作系统。操作系统请求CPU。CPU请求内存控制器。内存控制器实际上可以本地实现这个结构。
重要注释:正如我在开头所说,这只是JVM“可能”实现这些内容的__一种__方式。您不能依赖JVM实际以这种方式执行,它可能使用完全不同的系统。“我想知道它是如何工作的”并不是理解并发性的一个特别有用的方式,因为任何单一的答案都根据定义不是通用的(不适用于所有JVM和所有情况)。JVM的Java内存模型部分明确规定了您所获得的所有保证,如果您编写的代码仅依赖于这些保证,那么您的代码是正确的。如果依赖于您可靠观
英文:
The point of a JVM is: It doesn't matter how it works, just that it works. Specifically, that the JVM guarantees no two threads can both enter a synchronized (x) {}
block where x
is a reference to the same java object.
There are various ways that hardware handles this.
One is CAS operations (Compare-And-Set). A CAS operation works like this:
Given CAS(memLoc, 5, 20)
;
- Read the value at memory address x. IF it is '5', update it to '20' and return true. Otherwise, do not change it, and return false.
Various CPUs natively support CAS operations. At the CPU level, the CAS operation is conveyed to the thing that actually locally controls that memory (i.e. if there is a separate memory controller, the CAS instruction is sent to it, too), which in the end is a small but crucial single-system bottleneck (no risk of 2 cores simultaneously doing the thing, because it's not a CPU core that does this, it's the memory controller).
If CAS is available, stuff like this is simple. For example, we can give all objects a separate 64-bit field value, and we have a single 64-bit 'thread ID' counter. Then we modify a few basics in the JVM:
- Whenever a new thread is made, we have a loop that works like this:
int threadId = -1;
while (true) {
int currentThreadId = read(globalThreadIdCounterMemLoc);
threadId = currentThreadId + 1;
if (CAS(globalThreadIdCounterMemLoc, currentThreadId, threadId)) break;
}
return createNewThreadWithId(threadId);
This ensures every thread has a guaranteed unique ID. Once all threads have a unique ID:
- Add a threadOwner field to all objects.
- Implement synchronized as something like:
// an owner id of -1 means: Unowned.
public void jvmSynchronizedImpl(Thread owner, Object lockObj) {
while (true) {
boolean lockAcquired = CAS(lockObj.threadOwner, -1, owner.threadID);
if (lockAcquired) return true;
}
}
Obviously it doesn't work that simply - waiting to acquire a lock does not 'busy-wait' (where the CPU fans start blasting and your power usage goes way up because the CPU is actively and continuously checking the value; if the lock isn't available, the system waits).
Even if CAS is not available, CPU cores have LOCK instructions. For example, the x86/x64 architecture has a 'lock' modifier; it modifies any opcode that read-modify-writes memory. Such as an INC instruction (which needs to read a value from a location, increment it by 1, and write it back). If the opcode has a lock flag on it, the CPU ensures exclusive ownership of the relevant data. We just kicked the can down the road:
- How does a JVM implement locking? By relying on the CPU to do it.
Which just means we now ask: ... so, how does a CPU do it?
And the exact same rule applies: It does it. The docs spell out precisely what is guaranteed and what isn't - the CPU you buy today might implement the 'deal' as spelled out in the specs completely differently from another.
But, generally, with similar CAS-like mechanics on a shared register that all CPU cores can see. Or, it just locks a bus (and how does that work - oof, it's turtles all the way down: The bus has some specification that guarantees that if you do X and Y, then you get synchronicity guarantees, and does not spell out how it does that).
NB: CAS is sometimes called test-and-set. You might want to read the wikipedia page on test-and-set, which also covers hardware implementations. As I showed at the top, CAS is a really useful in-between: It's easy to understand, generally easy to implement by the 'lowest level thing in the hardware', and it is easy to convey (just 3 values: Location, Expected Value, and New Value - with a single bit to be returned by the operation). With CAS in your arsenal, writing more complicated synchronicity systems is easy (such as java's synchronized
, but also things like AtomicInteger
's incrementAndGet()
which is considerably faster than using synchronized
to get a consistent multi-core capable atomic incrementer). Hence why CAS is usually the solution used when all you can really do is ask the underlying whatever-it-may-be to provide you with the tools you need to make synchronicity promises. Hence, java code asks the JVM for CAS. The JVM asks the OS. The OS asks the CPU. The CPU asks the memory controller. The memory controller can actually natively implement the construct.
Important footnote: As I said at the top, this is just one way that a JVM could implement this stuff. You cannot rely on a JVM actually doing it this way, it might use completely different systems instead. "I want to know how it works" is not a particularly useful way to try to wrap your head around concurrency, because any single answer is by definition non-general (does not apply to all JVMs and all situations). The Java Memory Model section of the JVM spells out all the guarantees you do get, and if you write code that relies solely on those guarantees, your code is correct. If you rely on things you have reliably observed but which the JMM doesn't guarantee, you have a very very nasty thing: A bug in your code, that you cannot possibly test for. Your code will work fine today, and tomorrow, but next week it starts failing. Or it works for you but fails on tuesdays. Or it works for you but not for your important customer. And so on.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论