HashMap与ConcurrentHashMap:在线程之间的传递

huangapple go评论69阅读模式
英文:

HashMap vs ConcurrentHashMap: transfer between threads

问题

I have a question about using maps in multithreaded application. Suppose we have such scenario:

  1. Thread receives json data as List<Map<String, Object>> which is deserialized by Jackson Json.
  2. This thread modifies received maps.
  3. And then puts list into blocking queue to be consumed by another thread.

As you can see, the map is modified only by a single thread, but then it "becomes" read-only (nothing changes, just not modified anymore) and passed to another thread. Next, when I looked into implementations of HashMap (also TreeMap) and ConcurrentHashMap, the latter has volatile fields while the first two aren't. So, which implementation of Map should I use in this case? Does ConcurrentHashMap is overkill choice, or it must be used due to inter-thread transfer?

My simple tests show that I can use HashMap/TreeMap when they are modified synchronously and it works, but my conclusion or my test code may be wrong:

def map = new TreeMap() // or HashMap
def start = new CountDownLatch(1)
def threads = (1..5)
println("Threads: " + threads)
def created = new CountDownLatch(threads.size())
def completed = new CountDownLatch(threads.size())
threads.each { i ->
    new Thread({
        def from = i * 10
        def to = from + 10
        def local = (from..to)
        println(Thread.currentThread().name + " " + local)
        created.countDown()
        start.await()
        println('Mutating by ' + local)
        local.each { number ->
            synchronized (map) {
                map.put(number, ThreadLocalRandom.current().nextInt())
            }
            println(Thread.currentThread().name + ' added ' + number + ': ' + map.keySet())
        }
        println 'Done: ' + Thread.currentThread().name
        completed.countDown()
    }).start()
}

created.await()
start.countDown()
completed.await()
println('Completed:')
map.each { e ->
    println('' + e.key + ': ' + e.value)
}

The main thread spawns 5 child threads which update the common map synchronously. When they complete, the main thread successfully sees all updates by child threads.

英文:

I have a question about using maps in multithreaded application. Suppose we have such scenario:

  1. Thread receives json data as List&lt;Map&lt;String, Object&gt;&gt; which is deserialized by Jackson Json.
  2. This thread modifies received maps.
  3. And then puts list into blocking queue to be consumed by another thread.

As you can see, map is modified only by single thread, but then it "becomes" read-only (nothing chagnes, just not modified anymore) and passed to another thread. Next, when I looked into implementations of HasMap (also TreeMap) and ConcurrentHashMap, the latter has volatile fields while first two isn't. So, which implementation of Map should I use in this case? Does ConcurrentHashMap is overkill choice or it must be used due to inter-thread transfer?

My simple tests shows that I can use HashMap/TreeMap when they are modified synchronously and it works, but my conclusion or my test code may be wrong:

def map = new TreeMap() // or HashMap
def start = new CountDownLatch(1)
def threads = (1..5)
println(&quot;Threads: &quot; + threads)
def created = new CountDownLatch(threads.size())
def completed = new CountDownLatch(threads.size())
threads.each {i -&gt;
    new Thread({
        def from = i * 10
        def to = from + 10
        def local = (from..to)
        println(Thread.currentThread().name + &quot; &quot; + local)
        created.countDown()
        start.await()
        println(&#39;Mutating by &#39; + local)
        local.each {number -&gt;
            synchronized (map) {
                map.put(number, ThreadLocalRandom.current().nextInt())
            }
            println(Thread.currentThread().name + &#39; added &#39; + number +  &#39;: &#39; + map.keySet())
        }
        println &#39;Done: &#39; + Thread.currentThread().name
        completed.countDown()
    }).start()
}

created.await()
start.countDown()
completed.await()
println(&#39;Completed:&#39;)
map.each { e -&gt;
    println(&#39;&#39; + e.key + &#39;: &#39; + e.value)
}

Main thread spawns 5 child threads which updates common map synchronously, when they complete main thread successfully sees all updates by child threads.

答案1

得分: 2

java.util.concurrent类具有关于顺序性的特殊保证

> 内存一致性效果:与其他并发集合一样,将对象放入BlockingQueue之前线程中的操作先于从另一个线程中的BlockingQueue访问或删除该元素的操作。

这意味着您可以自由地使用任何类型的可变对象并对其进行任意操作,然后将其放入队列中。当它被检索时,您应用的所有操作将可见。

(更一般地注意,您展示的测试只能证明缺乏安全性;在大多数真实情况下,非同步代码在99%的时间内都能正常工作。正是最后的那1%会让您措手不及。)

英文:

The java.util.concurrent classes have special guarantees regarding sequencing:

> Memory consistency effects: As with other concurrent collections, actions in a thread prior to placing an object into a BlockingQueue happen-before actions subsequent to the access or removal of that element from the BlockingQueue in another thread.

This means that you are free to use any kind of mutable object and manipulate it as you wish, then put it into the queue. When it's retrieved, all of the manipulations you've applied will be visible.

(Note more generally that the kind of test you demonstrated can only prove lack of safety; in most real-world cases, unsynchronized code works fine 99% of the time. It's that last 1% that bites you.)

答案2

得分: 1

这个问题的范围很广。

您的原始场景

您说:

>[A]地图仅由单个线程修改,但后来它变为“只读”

棘手的部分是词语“然后”。当您,程序员说“然后”时,您指的是“时钟时间”,例如,我已经做了这个,现在做那个。但是由于各种各样的原因,计算机不会以这种方式“思考”(执行代码)。之前发生的事情和之后发生的事情需要在计算机中“手动同步”,以便计算机看到与我们看到的世界相同的方式。

这就是Java内存模型表达的方式:如果您希望在并发环境中使您的对象表现得可预测,您必须确保建立“happens before”边界。

在Java代码中,有一些东西建立happens before关系。稍微简化一下,只是举几个例子:

  • 单个线程中的执行顺序(如果语句1和2由同一线程按顺序执行,语句1所做的任何操作始终可以被语句2看到)
  • 当线程t1启动t2时,t1在启动t2之前所做的所有事情都可以被t2看到。与join()相反
  • 对于synchronized、对象监视器也是如此:在同步块内部的线程执行的每个操作都可以被在相同实例上同步的另一个线程看到
  • 对于java.util.concurrent类的任何专门方法也是如此。例如锁和信号量,当然还有集合:如果将一个元素放入同步集合中,将其取出的线程在放入元素的线程上有一个happen-before关系。
  • 如果T2与T1具有happens before关系,且T3与T2具有happens before关系,则T3也与T1具有happens before关系。

所以回到您的短语:

> 然后它变成“只读”

它确实变为只读。但是为了让计算机看到它,您必须赋予“然后”一个含义;也就是说,您必须在代码中建立一个“发生在之前的关系”。

之后您提到:

> 然后将列表放入阻塞队列

一个java.util.concurrent队列?多么棒的事情!恰好一个从并发队列中取出对象的线程与将该对象放入队列的线程具有“happens before”关系。

您已经建立了这种关系。所有在将对象放入队列之前由线程进行的变异都可以安全地被拉取它的线程看到。在这种情况下,您不需要ConcurrentHashMap(当然,如果没有其他线程同时变异相同的数据)。

您的示例代码

您的示例代码不使用队列。它修改一个被多个线程修改的单个映射(而不是与您的场景中提到的相反情况)。因此,它只是... 不同而已。但无论如何,您的代码没问题。

访问映射的线程执行如下:

synchronized (map) {
map.put(number, ThreadLocalRandom.current().nextInt())
}

synchronize提供了1) 线程的互斥和2) 一个happens before关系。因此,进入同步的每个线程都可以看到在另一个也在同步上的线程中发生的所有“之前发生的”事情(这是所有线程的情况)。

所以这里没有问题。

然后您的主线程执行:

completed.await()
println('Completed:')
map.each { e ->
println('' + e.key + ': ' + e.value)
}

在这里挽救您的是completed.await()。这与调用countDown()的每个线程建立了一个happens before关系,这是所有线程的情况。因此,您的主线程可以看到工作线程完成的所有工作。一切都很好。

除了... 我们经常忘记检查线程的引导。第一次工作线程在映射实例上进行同步时,之前没有任何线程这样做过。我们怎么能确定它们看到一个完全初始化和准备好的映射实例呢?

好吧,有两个原因:

  1. 在调用thread.start()之前初始化了映射实例,这建立了一个happens before关系。这就足够了
  2. 在工作线程内部,在开始工作之前还使用了门闩,这再次建立了关系。

您是双重安全的。

英文:

This question has a broad scope.

Your original scenario

You say :

>[A] map is modified only by single thread, but then it "becomes" read-only

The tricky part is the word "then". When you, the programmer say "then", you refer to "clock time", e.g. i've done this, now do that. But for an incredibly wide variety of reasons, the computer does not "think" (execute code) this way. What happened before, and what happens after need to be "syncrhonized manually" for the computer to see the world the way we see it.

That's the way the Java Memory Model expresses stuff : if you want your objects to behave predictably in a concurrent environment, you have to make sure that you establish "happens before" boundaries.

There are a few things that establish happens before relationships in java code. Simplifying a bit, and just to name a few :

  • the order of execution in a single thread (if statements 1 and 2 are executed by the same thread in that order, whatever 1 did is always visible by statement 2)
  • When thread t1 start()s t2, everything that t1 did before starting t2 is visible by t2. Reciprocally with join()
  • Same goes with synchronized, objects monitors : every action made by a thread inside a sync'd block is visible by another thread that syncs on the same instance
  • Same goes with any specialized methods of java.util.concurrent classes. e.g Locks and Semaphore, of course, but also collections : if you put an element in a syncrhonized collection, the thread that pulls it out has an happen-before on the thread that put it in.
  • If T2 has an happens before with T1, and if T3 has one with T2, then T3 also have it with T1.

So back to your phrase

> then it "becomes" read-only

It does become read ony. But for the computer to see it, you have to give a meaning to "then"; which is : you have to put an happen before relationship in your code.

Later on you state :

>And then puts list into blocking queue

A java.util.concurrent queue ? How neat is that! It just so happens that a thread pulling out an object from a concurrent queue has a "happens before" relationship with repsect to the thread that put the said object into the queue.

You have established the realtionship. All mutations made (before) by the thread that put the object into the queue are safely visible by the one that pulls it out. You do not need a ConcurrentHashMap in this case (if no other thread mutates the same data of course).

Your sample code

Your sample code does not use a queue. And it mutates a single map modified by multiple threads (not the other way around as your scenario mentions). So, it's just... not the the same. But either way, your code's fine.

Threads accessing the map do it like so :

synchronized (map) {
map.put(number, ThreadLocalRandom.current().nextInt())
}

The synchornize provides 1) mutual exclusion of the threads and 2) a happens before. So each thread that enters the synchonization see all that "happened before" in another thread that also syncrhonized on it (which is all of them).

So no problem here.

And then your main thread does :

completed.await()
println(&#39;Completed:&#39;)
map.each { e -&gt;
println(&#39;&#39; + e.key + &#39;: &#39; + e.value)
}

The thing that saves you here is completed.await(). This establishes a happens before with every thread that called countDown(), which is all of them. So your main thread sees everything that was done by the worker threads. All is fine.

Except... We often forget to check to bootstrap of threads. The first time a worker synchronizes on the map instance, nobody did it before. How come we can be sure that they see a map instance fully initialized and ready.

Well, for two reasons :

  1. You initialize the map instance BEFORE calling thread.start(), which establishes an happens before. This would be enought
  2. Inside your worker threads, you also use latches before starting the work, which then again establish a relationship.

You're doubly safe.

huangapple
  • 本文由 发表于 2020年9月18日 16:50:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/63952419.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定