英文:
Java: what is the best approach for high performance of multi-threading in a time-critical application?
问题
以下是翻译好的内容:
我正在使用Java 8开发一个网络代理应用程序。对于入口流量,主要逻辑是数据处理循环:从入站队列获取数据包,处理内容数据(例如,协议适配),然后将其放入发送队列。设计中允许多个虚拟TCP通道,因此在一系列数据处理线程中,数据处理线程在特定时间段内处理一组通道,作为整个作业的一部分(例如,对于channel.channelId%NUM_DATA_PROCESSING_THREADS = 0的通道,由负载平衡调度程序确定)。通道存储在数组中,并通过使用channeled作为索引来访问该单元,该单元由一个提供诸如register、deregister、getById、size等方法的类包装,该实例在程序中称为CHANNEL_STORE。我需要在主逻辑(数据处理循环)中使用这些方法由不同的线程(至少是调度线程、数据处理线程和用于从GUI销毁通道的控制操作线程)调用。然后我需要考虑这些线程之间的并发性。我有几种候选方法:
- 
使用
synchronized或可重入锁围绕register、deregister、getById等进行操作。这是最简单的方法,也是线程安全的。但我对锁定(CAS)机制的性能有顾虑,因为我需要对CHANNEL_STORE(特别是getById)进行非常高频率的操作。 - 
将
CHANNEL_STORE的操作指派给SingleThreadExecutor,通过executor.execute(runnable)和/或executor.submit(callable)。担心的是在数据处理循环的每个这样的目的地都创建runnable/callables的性能:创建runnable实例并调用execute- 我不知道这是否比同步或可重入锁甚至更昂贵。实际上(到目前为止)存在后操作,因此数据处理循环中只放置runnable,无需等待callable返回,尽管控制循环中需要后操作。 - 
将
CHANNEL_STORE的操作指派给一对ArrayBlockingQueue的专用任务,而不是Executor。对于每次访问CHANNEL_STORE,将任务指示符与参数附件一起放入第一个队列,然后专用线程通过阻塞方法take循环处理此队列并在CHANNEL_STORE上操作。然后,它将结果放入第二个队列,以便Designator继续进行后操作(尽管目前不需要)。我认为这是最快的方法,假设JVM中的阻塞队列是无锁的。对此的担忧在于代码非常混乱且容易出错。 
我认为第2和第3种方法可能被称为“序列化”。
我不能简单地将任务分配给线程池进行数据处理并忘记它们的原因是,每个通道的TCP流数据包不能被打乱,它必须按通道基础进行串行处理。
问题:
- 
第二种方法与第一种方法相比的性能如何?
 - 
对于我的情况,有什么建议?
 
我目前正在使用流式IO进行局域网读/写操作。如果使用NIO,NIO线程与数据处理线程之间的协调可能会带来额外的复杂性(例如后操作)。因此,我认为这个问题对于像我这样的时间关键(基于流的、多通道网络)应用程序是有意义的。
英文:
I’m developing a network proxy application using Java 8. For ingress, the main logic is the data-processing-loop: getting a packet in the inbound queue, processing the content data (e.g. protocol-adoption), and put it in the send-queue. Multi virtual TCP channels are allowed in the design, so a data processing thread, among a list of data-processing threads, handles a bunch of channels at a specific time duration, as a part of the whole job (e.g., for the channels with channel.channelId%NUM_DATA_PROCESSING_THREADS = 0, which is determined by a load-balancing scheduler). Channels are stored in an array and accessed by using the channeled as the index of the cell, which is wrapped by a class that provides methods like register, deregister, getById, size, etc., and the instance is called CHANNEL_STORE in the program. I need to use these methods in the main logic (data-processing-loop) by different threads (at least dispatcher thread, data processing thread, and the control operation thread for destroying a channel from the GUI). Then I need to consider concurrency among these threads. I have several candidate-approaches:
- 
Use
synchronizedor reentrant locks surrounding theregister,deregister,getById, etc. This is the simplest and its thread-safe. But I have performance concerns about the lock (CAS) mechanisms since I need to perform the operations on theCHANNEL_STORE(especiallygetById) at a very high frequency. - 
Designate the operations of
CHANNEL_STOREto a SingleThreadExecutor byexecutor.execute(runnable)and/orexecutor.submit(callable). The concern is the performance of creating runnable/callables at each such destination in the data-processing-loop: creating the runnable instance and callexecute– I have no idea will this be even more expansive than the synchronized or reentrant locks. In the reality (so far) there is post-operation so only putting runnable and no need to wait for the callable return in the data-processing-loop, although post-operation is needed in the control loop.

 - 
Designate the operations of
CHANNEL_STOREto a dedicated task by a pair of ArrayBlockingQueue instead of Executor For each access toCHANNEL_STORE, put a task-indicator together with an attachment of parameters to the first queue, and then the dedicated thread loops on this queue by the blocking methodtakeand operates on theCHANNEL_STORE. Then, it put the result to the 2nd queue for the Designator to continue the post-operation (currently no need, however). I regard this as the fastest, assuming the blocking queue in JVM is lock-free. The concern on this is that code is very messy and error-prone. 
I think the 2nd and 3rd may be called "serialization".
The reason that I cannot simply assign tasks to a thread-pool for data processing and forget them is that the TCP stream data packets of each channel cannot be disordered, it has to be in serial per channel base.
Questions:
- 
what’s the performance of the second way comparing to the first way?
 - 
what’s the suggestion for my situation?
 
I'm currently using stream-IO for LAN read/write. If using NIO, the coordination between the NIO thread and data processing threads may bring additional complexity (e.g post operations). So I think this question is meaningful for time-critical (stream-based, multi-channel network) applications like mine.
答案1
得分: 1
如果我理解你的用例正确,这在并发编程中是一个常见的问题。一种解决方案是使用环形缓冲区方法,通常可以很好地解决同步和过多对象创建的问题。
你可以在LMAX Disruptor库中找到这的一个良好实现。请访问https://lmax-exchange.github.io/disruptor/了解更多信息。但请记住,这并非是魔法,必须根据你的用例进行调整。
英文:
If I understand well your use case, this is a common problem in concurrent programming. One solution is to use the ring buffer approach, which usually offers a good solution to both synchronization and too many objects creation problems.
You can find a good implementation of this in the lmax dispruptor library. See https://lmax-exchange.github.io/disruptor/ to know more about this. But keep in mind that it is not magic and must be adapted to your use case.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论