可以在多个线程中修改ArrayList中的项吗,如果这些线程从不修改相同的项?

huangapple go评论72阅读模式
英文:

Is it OK to modify items in an ArrayList from multiple threads, if those threads never modify the same item?

问题

一些(简化的)背景。

假设我有一个 ArrayList<ContentStub>,其中 ContentStub 定义如下:

public class ContentStub {
    ContentType contentType;
    Object content;
}

我有多个类的实现,用于为每个 ContentType “填充”存根,例如:

public class TypeAStubInflater {

    public void inflate(List<ContentStub> contentStubs) {
        contentStubs.forEach(stub ->
                             {
                                 if(stub.contentType == ContentType.TYPE_A) {
                                    stub.content = someService.getContent();
                                 }
                             });         
    }
}

思路是有一个 TypeAStubInflater 仅修改 ContentType.TYPE_A 的项,在一个线程中运行;而 TypeBStubInflater 仅修改 ContentType.TYPE_B 的项,依此类推。但是,每个实例的 inflate() 方法都在并行地修改同一个 contentStubs 列表中的项。

然而:

  • 没有线程会改变 ArrayList 的大小。
  • 没有线程会尝试修改另一个线程正在修改的值。
  • 没有线程会尝试读取另一个线程写入的值。

考虑到所有这些,似乎不需要额外的措施来确保线程安全。从对 ArrayList 实现的(非常)快速查看中,似乎没有出现 ConcurrentModificationException 的风险。然而,这并不意味着其他问题不会出现。我是否漏掉了什么,或者这样做是安全的?

英文:

A bit of (simplified) context.

Let's say I have an ArrayList&lt;ContentStub&gt; where ContentStub is:

public class ContentStub {
    ContentType contentType;
    Object content;
}

And I have multiple implementations of classes that "inflate" stubs for each ContentType, e.g.

public class TypeAStubInflater {

    public void inflate(List&lt;ContentStub&gt; contentStubs) {
        contentStubs.forEach(stub -&gt;
                             {
                                 if(stub.contentType == ContentType.TYPE_A) {
                                    stub.content = someService.getContent();
                                 }
                             });         
    }
}

The idea being, there is TypeAStubInflater which only modifies items ContentType.TYPE_A running in one thread, and TypeBStubInflater which only modifies items ContentType.TYPE_B, etc. - but each instance's inflate() method is modifying items in the same contentStubs List, in parallel.

However:

  • No thread ever changes the size of the ArrayList
  • No thread ever attempts to modify a value that's being modified by another thread
  • No thread ever attempts to read a value written by another thread

Given all this, it seems that no additional measures to ensure thread-safety are necessary. From a (very) quick look at the ArrayList implementation, it seems that there is no risk of a ConcurrentModificationException - however, that doesn't mean that something else can't go wrong. Am I missing something, or this safe to do?

答案1

得分: 1

一般来说,这将起作用,因为您没有修改“List”本身的状态,如果在循环时有任何迭代器处于活动状态,这将引发“ConcurrentModificationException”,而是只是修改列表内部的一个对象,在列表的角度看是可以的。

我建议将您的列表拆分为“Map<ContentType,List>”,然后使用这些特定列表启动线程。

您可以使用以下方法将列表转换为映射:

Map<ContentTypeContentStub> typeToStubMap = stubs.stream().collect(Collectors.toMap(stub -> stub.contentType Function.identity()));

如果您的列表不是很大(<1000个条目),我甚至建议不使用任何线程,而只是使用普通的for-i循环进行迭代,甚至使用“.foreach”如果这2个额外的整数不是问题的话。

英文:

In general, that will work, because you are not modifying the state of the List itself, which would throw a ConcurrentModificationException if any iterator is active at the time of looping, but rather are modifying just an object inside the list, which is fine from the list's POV.

I would recommend splitting up your into a Map&lt;ContentType, List&lt;ContentStub&gt;&gt; and then start Threads with those specific lists.

You could convert your list to a map with this:

Map&lt;ContentType, ContentStub&gt; typeToStubMap = stubs.stream().collect(Collectors.toMap(stub -&gt; stub.contentType, Function.identity()));

If your List is not that big (<1000 entries) I would even recommend not using any threading, but just use a plain for-i loop to iterate, even .foreach if that 2 extra integers are no concern.

答案2

得分: 1

假设线程A写入TYPE_A内容,线程B写入TYPE_B内容。列表contentStubs仅用于获取ContentStub的实例:仅限读访问。因此从ABcontentStubs的角度来看,没有问题。然而,线程AB进行的更新很可能永远不会被另一个线程看到,例如另一个线程C很可能会得出结论,对于列表中的所有元素,stub.content == null

造成这种情况的原因是Java内存模型。如果不使用诸如锁、同步、volatile和原子变量等结构,内存模型不会保证一个线程对对象的修改在另一个线程中何时可见。为了使这更具实际意义,让我们举个例子。

想象一个线程A执行了以下代码:

    stub.content = someService.getContent(); // 发生在元素[17]位置

列表元素17是指向全局堆上ContentStub对象的引用。虚拟机可以对该对象进行私有线程副本。线程A中对引用的所有后续访问都使用该副本。虚拟机可以自由决定何时以及是否更新全局堆上的原始对象。

现在想象一个线程C执行了以下代码:

    ContentStub stub = contentStubs.get(17);

虚拟机很可能会在线程C中使用私有副本执行相同的操作。

如果线程C在线程A更新对象之前已经访问了该对象,则线程C很可能会使用未更新的副本,并且在相当长的一段时间内忽略全局原始对象。但是,即使线程C在线程A更新对象之后第一次访问对象,也不能保证线程A的私有副本中的更改已经传播到全局堆中。

简而言之:没有锁或同步,线程C几乎肯定只会在每个stub.content中读取null值。

这种内存模型的原因是性能。在现代硬件上,性能和所有CPU/核心的一致性之间存在权衡。如果现代语言的内存模型需要一致性,在所有硬件上保证这种一致性是非常困难的,而且很可能会严重影响性能。因此,现代语言采用了低一致性,并为开发人员提供了在需要时强制执行它的显式结构。结合编译器和处理器的指令重排,这使得对程序代码进行传统的线性推理变得......有趣。

英文:

Let's assume the thread A writes TYPE_A content and thread B writes TYPE_B content. The List contentStubs is only used to obtain instances of ContentStub: read-access only. So from the perspective of A, B and contentStubs, there is no problem. However, the updates done by threads A and B will likely never be seen by another thread, e.g. another thread C will likely conclude that stub.content == null for all elements in the list.

The reason for this is the Java Memory Model. If you don't use constructs like locks, synchronization, volatile and atomic variables, the memory model gives no guarantee if and when modifications of an object by one thread are visible for another thread. To make this a little more practical, let's have an example.

Imagine that a thread A executes the following code:

    stub.content = someService.getContent(); // happens to be element[17]

List element 17 is a reference to a ContentStub object on the global heap. The VM is allowed to make a private thread copy of that object. All subsequent access to reference in thread A, uses the copy. The VM is free to decide when and if to update the original object on the global heap.

Now imagine a thread C that executes the following code:

    ContentStub stub = contentStubs.get(17);

The VM will likely do the same trick with a private copy in thread C.

If thread C already accessed the object before thread A updated it, thread C will likely use the &ndash; not updated &ndash; copy and ignore the global original for a long time. But even if thread C accesses the object for the first time after thread A updated it, there is no guarantee that the changes in the private copy of thread A already ended up in the global heap.

In short: without a lock or synchronization, thread C will almost certainly only read null values in each stub.content.

The reason for this memory model is performance. On modern hardware, there is a trade-off between performance and consistency across all CPUs/cores. If the memory model of a modern language requires consistency, that is very hard to guarantee on all hardware and it will likely impact performance too much. Modern languages therefore embrace low consistency and offer the developer explicit constructs to enforce it when needed. In combination with instruction reordering by both compilers and processors, that makes old-fashioned linear reasoning about your program code &hellip; interesting.

huangapple
  • 本文由 发表于 2020年8月20日 16:49:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/63501515.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定