英文:
Difference between Concurrent HashMap merge() and put()
问题
我最近偶然发现了一篇文章,该文章阐述了在ConcurrentHashMap中执行原子操作时merge方法的重要性。以下是该文章的链接:
https://www.nurkiewicz.com/2019/03/mapmerge-one-method-to-rule-them-all.html
文章指出,merge方法要么在给定键下放入新值(如果不存在),要么使用给定值更新现有键(UPSERT),并对这个概念进行了解释。
然而,这正是put()方法所做的。
在ConcurrentHashMap中,连续使用get()和put()是不线程安全的。
请指导我理解merge()如何处理同时使用get()和put()的情况,以及如何使ConcurrentHashMap上的操作线程安全?
英文:
I recently tumbled upon an article which stated the importance of merge method in ConcurrentHashMap performing atomic operations. Below is the link for the article:
https://www.nurkiewicz.com/2019/03/mapmerge-one-method-to-rule-them-all.html
It says that merge method either puts new value under the given key (if absent) or updates existing key with a given value (UPSERT) with explanation of this concept.
However, this is exactly what put() method does.
The get() and then put() together in ConcurrentHashMap are not thread-safe.
Please guide me to understand how is merge() taking care of this get() and put() together scenario and making operations on ConcurrentHashMap thread-safe?
答案1
得分: 4
并发代码的一个关键概念是,连续执行的两个原子操作的效果不一定与执行同一操作的单个原子操作相同。
我们暂且忽略地图中已经不包含值的情况,因为那个情况不太有趣(仍然存在竞态条件,但另一个情况更有趣)。
因此,如果对于给定的键已经有一个值,那么 merge
基本上变成了 put(key, mergeFunction(newValue, get(key)))
。
如果按照这种方式实现,那么您可能会遇到一个非常真实的丢失更新的问题:
- 您的线程执行
get(key)
并获取当前值(我们称其为 v1) - 另一个线程将键的绑定更新为新值(我们称其为 v2)
- 您的线程使用 v1 和合并参数计算新的更新值(我们称其为 v3)
- 您的线程将键的绑定更新为新合并的值 v3
请注意,对 v2
的更新在您的代码中基本上被忽略(覆盖)。如果映射值用于表示计数器,这意味着您基本上忽略了一个更新,从而得到错误的结果。
merge
提供的功能是在不必担心其他并发更新会改变值并导致丢失更新的情况下,对现有值应用更新的一种方式。
英文:
One key concept of concurrent code is that two atomic operations execute one after the other don't necessarily have the same effect as a single atomic operation doing the same thing.
Let's ignore the case where the map doesn't already contain a value, since that is the less interesting one (there's still a race condition that you'd need to handle, but the other one is more interesting).
So if you already have a value for a given key, then merge
basically becomes put(key, mergeFunction(newValue, get(key))
.
If you implemented it that way then you could run into a very real issue of losing updates:
- your thread executes
get(key)
and gets the current value (let's call it v1) - another thread updates the binding for key to a new value (let's call it v2)
- your thread computes the new updated value using v1 and your argument to merge (let's call it v3)
- your thread updates the binding for key to the newly merged value v3
Note that the update to v2
is basically ignored (overwritten) by your code. If the map values are meant to represent counters, that means that you've basically ignored one update entirely and will get a wrong result.
What merge
provides is a way to apply an update to an existing value without having to worry about other concurrent updates changing the value out from under you and you losing updates.
答案2
得分: 2
你忽略的关键是merge
方法接受一个BiFunction
(一个remappingFunction),该函数计算新值,而put
操作则是盲目地插入/替换一个值。
这个remappingFunction接收旧值和新值,你可以利用它们执行一些计算并返回更新后的新值。
英文:
The point you are missing is that the merge
method accepts a BiFunction
(a remappingFunction) which compute the new value whereas a put blindly inserts/replaces a value.
The remappingFunction received the old and new values and using them you can perform some computation and return the new updated value.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论