春季批处理 – 重新触发服务激活器

huangapple go评论95阅读模式
英文:

Spring Batch - Re-triggering Service Activator

问题

我们有一个以主从模式运行的Spring Batch应用程序。在主节点端,有一个Reader查询源数据库,并在将记录推送到中间表之前对记录进行一些处理。

在此过程中,主节点还会并行启动从节点。这些从节点从中间表中读取属于其分区的记录,并同时开始处理它们,有5个从节点在中间表中有不同的记录分区需要处理。

主节点和从节点在不同的JVM中运行。

主节点使用MessageChannelPartitionHandler与从节点进行通信。在从节点端,有一个Service Activator触发从中间表中读取和处理记录的步骤。当分区消息到达指定通道时,服务激活器会被触发。完成后,每个从节点通过回复通道向主节点发送确认。

假设在从源数据库中提取记录时主节点端出现问题,或者在将记录插入中间表时出现网络延迟。从节点不会看到其分区的任何新记录,因此它们的读取器会自动关闭,并且它们会过早地开始向主节点发送回复。

然而,主节点端的处理过程尚未完全完成。在从节点步骤完成后,中间表中可能会有更多的新记录。当出现这种情况时,所有这些溢出的记录必须仅在下一次作业运行期间进行处理。

在这种情况下,是否有办法在主节点上再次触发从节点的服务激活?换句话说,是否可以强制从节点等待,直到主节点的处理完全完成并且所有记录都可在中间表中获得,然后再向主节点发送回复消息?

英文:

We have a Spring Batch application that runs in Master-Slave mode. At the master's end there is a Reader which queries the source database and does some processing on the records before it pushes the records into an intermediate table.

While this happens, the master also launches the slaves in parallel. These slaves read the records from the intermediate table that are subject to their partition and start processing them simultaneously, there are 5 slaves that have a distinct partition of records in the intermediate table to deal with.

The Master and Slaves run in different JVMs.

The master uses MessageChannelPartitionHandler to communicate with the slaves. At the slave's end, there is a Service Activator that triggers the step for reading and processing the records from the intermediate table. The Service Activation happens when a partition message arrives in the designated channel. Once completed, each slave acknowledge back to the master through the reply channel.

Suppose, if there has to be some issue at the Master's end while pulling records from the source database or if there is some network delay while inserting the records into the intermediate table. The slaves do not see any new records for their partition, so their readers get closed automatically and they prematurely start sending out replies to the master.

However, the process at the master's end is not fully completed yet. After the slave steps are completed there can be some more new records in the intermediate table. When this situation happens, all these spilled over records have to processed only during the next Job run.

Is there a way to trigger the Service Activation again on the slaves from the master when this situation happens? In other words, can we enforce the slaves to wait till the master's processing is fully completed and all the records are available in the intermediate table, before they send out the reply messages to the master?

答案1

得分: 1

> 在从步骤完成之后,中间表中可能会有一些新的记录。

批处理是关于处理固定数据集的。如果数据源是移动的,就会变成流处理。为作业指定固定的数据集是实现可重启性的关键。

根据您的设计,分区是不固定的。因此,您需要确保每个从属进程处理一组固定的记录,或者改用流处理解决方案。

英文:

> After the slave steps are completed there can be some more new records in the intermediate table

Batch processing is about processing fixed data sets. If the data source is moving, it becomes stream processing. Assigning a fixed data set for a job is what enables restartability.

According to your design, partitions are not fixed. So you need to make sure each slave processes a fixed set of records or use a streaming solution instead.

huangapple
  • 本文由 发表于 2020年4月7日 08:04:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/61070785.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定