Talend是OnComponantOk线程安全的。

huangapple go评论79阅读模式
英文:

Talend is OnComponantOk Thread safe

问题

我对使用Talend进行并行处理有一个问题,我正在构建一个ETL流程,该流程可以并行删除多个文件,然后更新数据库表。

我的任务流程如下:tFlowToIterate === 并行迭代 x10 ===> tDelete ===OnComponanantOK===> tDBRow

我的tDBRow需要在tFlowToIterate中定义的变量,它能够正常工作!然而,我不太清楚它能够正常工作的逻辑是什么?Talend是如何确保tDBRow在这种情况下具有适当的值?

我的理论是,在OnCompnanatOk链接中的tDBRow就像是在tDelete迭代下工作的子任务。

是否有人可以解释为什么以及如何能够正常工作呢?

英文:

i have a question about parallelism with Talend effectively i am building an ETL that deletes multiple files in parallel then updates a db table.

my jobs runs tFlowToIterate === iterate in parallel x10 ===> tDelete ===OnComponanantOK===> tDBRow

my tDBRow requires variables defined in the the tFlowToIterate it works! However i am unclear on the logic around why it works??? how does Talend ensure that a tDBrow has the appropriate value under this?

my theory is that the tDBRow on a OnCompnanatOk link is like a child the works under the tDelete iteration

can anyone explain how / why this works?

答案1

得分: 1

当Talend启动新线程时,它会创建全局地图(globalMap)的副本,您的值存储在全局地图中,因此每个DBRow都会有自己的全局地图。(如果在全局地图中存储了大量数据,这可能会导致更高的内存使用率。)

您需要小心的是另一个方向,因为全局地图是写同步的。因此,如果在线程内部将一个值放入其中,它也会被写入到父级中。在现实生活中,这意味着如果您尝试在全局地图中递增一个变量,您的并行线程都会看到值0并将其更新为1。因此,第11个线程将从1开始。

因此,如果您想避免这种情况(例如,设置(hadError,true),然后在以后检查hadError),在线程启动时请确保初始化您希望在线程中依赖的那些值。或者,如果逻辑更复杂,可以将该子作业作为新作业,这样全局地图就不会变得损坏。

英文:

When talend starts a new thread it creates a copy of the globalMap, your values are in the globalMap, so each DBRow will have its own globalMap. (If you store a lot of data in globalMap this could result in higher memory usage.)

What you need to be careful is the other direction, as the globalMap is write synchronized. So if you put a value to it inside the thread it will be written to the parent as well. In real life this means that if you try to increment a variable in globalMap your parallel thread will both see the value 0 and write back 1. So the 11th tread will start with 1.

Thus if you want to avoid this, (e.g. put ( hadError, true) then check for hadError later, when a thread starts make sure you initialize those values that you want to depend on in your thread. Or if the logic is more complex then make that subjob a new job, so the globalMap can't become corrupted.

huangapple
  • 本文由 发表于 2020年9月11日 08:37:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/63839330.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定