英文:
Two phase commit protocol with two sinks
问题
我对了解Flink中在有多个数据汇时如何使用两阶段提交协议感兴趣。我对以下两种情况感兴趣:
- 当两个数据汇都支持两阶段提交时。
- 当其中一个支持两阶段提交而另一个不支持时。
是否可以保证分布式事务适用于所有数据汇,还是每个数据汇都有单独的事务?换句话说,如果两个数据汇都支持两阶段提交,但其中一个数据汇失败,而另一个数据汇能够提交,会发生什么情况?
英文:
I am interested in getting more details about how the two phase commit protocol works in Flink when having more than one sink. I am interested in the two cases:
- When both sinks support 2PC
- When one of them supports 2PC and the other not
Is it guaranteed that the distributed transaction is for all sinks or we have a different transaction per sink? In other words, if both support 2PC, one of the sinks fails, and the other is able to commit, what will happen?
答案1
得分: 1
在Flink中,每个数据汇(sink)都负责自己的状态管理,包括任何2PC协议的实现。这种划分是必要的,因为某些数据汇根本不支持2PC。
当在Flink中触发检查点时,2PC数据汇将启动预提交(precommit)。只有在预提交成功后,检查点才会继续进行。当整个执行图的检查点成功完成(所有操作符/UDF的状态都已存储)时,在检查点的最后阶段,数据汇将执行实际的提交。
回到你的问题:如果任何一个数据汇无法成功提交,检查点将失败,整个Flink应用程序也将失败,然后重新从最后成功的检查点开始运行。
英文:
In Flink, each sink is responsible for its own state management and that includes any 2PC protocol implementation. This division is necessary as some sinks don't support 2PC at all.
When a checkpoint is triggered in Flink, 2PC sinks will start a precommit. Only if that precommit was successful, the checkpoint continues to be taken. When the checkpoint of the whole execution graph has been successfully taken (state of all operators / UDFs stored), as a last phase of the checkpoint, the sinks will perform the actual commit.
Coming back to your question: if any of the sinks fail to commit, the checkpoint will fail and with it the whole Flink application, so that it restarts with the last successfully taken checkpoint.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论