ArrayList和Vector比较

huangapple go评论72阅读模式
英文:

ArrayLists and Vector comparison

问题

在一个分配的问题中,有人问是否在多个线程访问和修改时使用ArrayList是否好。如果不好的话,有什么最好的方法。

我知道ArrayList不是同步的,所以从性能的角度来看,使用ArrayList是一个不错的选择。

但是由于多个线程正在修改和访问它,我认为数据的完整性可能无法得到保障。从这个角度来看,我认为在考虑数据完整性时,Vector可能更合适,因为Vector是同步的。

请问我的建议是否正确?或者在上述情况下,ArrayList是否是最佳选择?

英文:

In an Assignment question it was asked whether it is good to use and ArrayList when there are multiple Threads accessing and modifying it.If not what is the best way to do that

I know that the ArrayLists are not Synchronized So that means the performance wise it is a good choice use ArrayLists

But due to the multiple Threads modifying and accessing it I think that the integrity of data may not be secured So in that perspective I think Vectors are more suitable when considering about the data integrity because Vectors are Synchronized.

Please I want to know whether my suggestions are correct.Or whether ArrayLists are the best in above mentioned scenarion

.

答案1

得分: 1

如果您使用Vector,它将始终具有一些用于同步的开销。ArrayList并不是那么安全,但对于您的线程,您可以通过自己的方式进行同步,例如使用一个同步对象或者:

Collections.synchronizedList(new ArrayList<String>());

然后在线程任务完成后仍然可以访问ArrayList,而无需Vector的开销。如果您只在线程中使用列表,我想Vector也是一个不错的选择。我尽量避免频繁使用Vector。

英文:

If you are using Vector it will always have a bit of overhead for synchronising. A ArrayList is not that safe but for your threads you can do the sychronisation by yourself, e.g. with an synchronised object or:

Collections.synchronizedList(new ArrayList&lt;String&gt;());

Then you can still access the ArrayList after your thread tasks without the overhead of Vector. If you are using the list just in threads, I guess Vector is also a good choice. I try to avoid Vector as often as I can.

答案2

得分: 0

我只有在英特尔的Thread Building Blocks和微软的Parallel Patterns Library中使用过类似concurrent_vector的东西,但我认为它们可能与Java的Vector相当。

根据我的测试,在这两者中,concurrent_vector的性能要比大多数替代方案慢得多,比如执行多个线程并将结果收集到本地的线程不安全容器(如C++中的std::vector),然后将结果附加到一个带锁的共享集合中,或者创建一个列表的列表,每个线程在自己的列表中编写(使用某种数组索引),然后以串行方式将结果组合成一个单一的列表。

线程安全版本的好处在于方便性,就我所知。我甚至从英特尔自己的库中找不到一个并发的、线程安全的随机访问序列,在这一点上我无法战胜那些在线程中在本地累积结果,然后在一个线程中使用一些基本的线程同步,在串行方式下将结果组合起来的线程不安全的替代方案。如果我的测试以写入为主,那么通常情况下,使用这种简单的方法往往可以比使用并发容器获得2-3倍更快的结果。

话虽如此,在许多情况下,实际上很少出现将元素写入/附加到容器的时间被偏向。我发现我的现实世界中大多数线程案例都花在读取和计算数字等工作上,只有很少一部分时间用于将结果推送到容器的末尾。因此,往往并发容器的开销开始变得可以忽略不计,而且它肯定更方便,也更不容易在多个线程之间误用。

在对一个并行算法的性能关键部分,写入一个容器占据了大部分时间的可能很少见。根据我的测试和经验,相比之下,并发容器从来没有在这种情况下比粗糙的替代方案表现得更好。所以,如果这是代码中特别关键的性能部分,并且在你的性能分析中看到了并发容器方法的热点,我建议尝试使用涉及在线程本地容器中累积输出的替代方法(例如,线程本地的ArrayList),然后在算法结束时将所有结果组合起来(例如,一个组合的ArrayList),可以在一个线程内串行进行,或者在锁/临界区内进行。

不幸的是,要使事物既线程安全又具有最大的可扩展性是很棘手的。你可以通过在一个线程中执行所有操作来实现架构上的最大线程安全性。线程安全问题解决了!但这根本无法充分利用并行性。我认为并发就像这样一个平衡行为,它与Amdahl定律相互冲突。在我看来,与最佳性能相比,我发现并发容器处于中间地位,虽然它们几乎总是在某种程度上牺牲了最优性能,但最优和几乎最优之间的差异可能是微不足道的。就我所见,你可以进行测量和观察。

至于你问题中的这部分内容:

但由于多个线程的修改和访问,我认为数据的完整性可能无法得到保障

从我的角度来看,这是与设计相关的。距离我从计算机科学的理论角度思考已经有很多年了,但在这个观点中有一个隐含的假设,即这些数据必须是共享的。根据用户端的要求,数据可能需要共享,也可能不需要共享。以访问场景(scene)中的数据的视频游戏为例。在渲染引擎、物理引擎等都必须共享同一个场景的情况似乎是合理的。从人类的角度来看,这是有道理的。

然而,这并不一定是真实情况。从用户端的角度来看,如果渲染器有自己的场景副本,用于将结果渲染到屏幕上,这个副本可能与其他事物略微不同步。因此,在需要最佳性能、帧率或最小等待时间的情况下,通常可以复制数据,允许线程以尽可能快的速度运行,而不需要进行任何线程同步(包括排除原子操作)以访问共享数据。这可以通过持久数据结构等方式进行,也可以不进行。这取决于设计。但我认为多线程编程的一个反直觉方面是,我们首先倾向于认为更多的事物需要在线程之间共享,而实际上它们并不需要共享,放弃这种假设可以产生全新程度的并行性。我发现,我和我的许多同事都因为手边有并发容器而倾向于在线程之间共享更多的数据,而实际上我们不必这样做。如果充分利用硬件是目标,那么在我们的软件像游戏引擎一样受到性能影响时,放弃共享数据的概念在很大程度上是有意义的。我实际上建议在软件

英文:

I only have experience with the likes of concurrent_vector in Intel's Thread Building Blocks and Microsoft's Parallel Patterns Library, but I would think they are probably comparable with Java's Vector.

From my tests, concurrent_vector in both is quite a bit slower than most alternatives like executing multiple threads and gathering the results in local, thread-unsafe containers (like std::vector in C++), and then appending the results to a shared collection inside a lock, or creating a list of lists with each thread writing to its own list (using some array index) and then, in a serial fashion, combine the results into a single list at the end.

The benefit of the thread-safe version is convenience as far as I can tell. I have never found a concurrent, thread-safe random-access sequence even from Intel's own library where I can't beat it with thread-unsafe alternatives where I accumulate results locally in one thread and then use some basic thread synchronization and combine the results in a serial fashion in one thread. If my tests are write-heavy, then often I can get 2-3 times faster results with this crude method over the concurrent container.

That said, it might be rare in many cases where your times are actually skewed towards writing/appending elements to a container. Far more often I find the bulk of my real-world thread cases are spent reading and crunching numbers and the like and only a small portion of the time is spent pushing the results to the back of a container. So very often the overhead of a concurrent container starts to become negligible, and it's certainly a whole lot more convenient and far less prone to misuse across threads.

In the probably-rare case where writing to a container is a very good chunk of the time spent in a parallel algorithm, concurrent containers never offer a performance benefit to the crude alternatives in my tests and experience. So if it's a particularly performance-critical section of code, and you're seeing hotspots in the methods of the concurrent container in your profiling sessions, I'd give alternatives a try that involves accumulating output in thread-local containers that aren't concurrent (i.e., a thread-local ArrayList) and then combine all their results at the end of the algorithm (ex: a combined ArrayList) in a serial fashion from one thread or inside a lock/critical section.

Unfortunately, it's tricky to make things both thread-safe and maximally scalable. You could make things maximally thread-safe in architecture by just executing everything one thread. Thread-safety solved! But that doesn't scale at all to take advantage of parallelism. I find concurrency like that -- a balancing act headbutting Amdahl's Law. I find concurrent containers to be in the medium spot in the sense that using them almost always sacrifices optimal performance to some degree, but the difference between optimal and not-quite-optimal might be quite negligible. Well, you measure and see as I see it.

As for this part of your question:

> But due to the multiple Threads modifying and accessing it I think
> that the integrity of data may not be secured

That's design-related from my perspective. It's been a good number of years since I thought from the theoretical standpoint computer science, but there is an implicit assumption here in the notion that this data must be shared. Depending on the user-end requirements, the data may or may not need to be shared. Take a video game accessing data from a scene. It might seem the case that the rendering engine and physics engine and so forth must all share the same scene. That makes intuitive sense. That makes human sense.

Yet that's not necessarily the case. From a user-end standpoint, it might not be important if the renderer has its own copy of a scene that it uses to render results to the screen which might be slightly out of sync with other things going on. So there are often times, at least when optimal performance or frame rates or minimal waiting times are required, where you can duplicate/copy data to allow threads to go as fast as they can without any thread synchronization (including the exclusion of atomic operations) required to access shared data. This can get very elaborate with persistent data structures and the like or not. It depends on the design. But I think one counter-intuitive aspect of multithreaded programming is that we're first tempted to think of more things as needing to be shared among threads than they really need to be shared, and discarding that assumption can produce a whole new degree of parallelism. I've found with many of my colleagues and myself that having concurrent containers at our fingertips tempts us to share more data between threads than we really have to do so. If utilizing the hardware as effectively as we can is the goal, then it often makes sense to abandon the notion of shared data as much as we can, including these concurrent containers. I would actually suggest minimizing their usage if your software is as performance-critical as a game engine.

huangapple
  • 本文由 发表于 2020年9月25日 16:53:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/64060907.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定