英文:
Kafka - How to recover if a partition is lost?
问题
我有一个Kafka集群,有4个节点,一个主题分成了40个分区,复制因子为2。Kafka版本是2.3.1。
当两个Kafka节点同时死机,无法再次启动它们并且Kafka日志丢失时,我该如何恢复?
我确定我会丢失一些数据,因为一些分区丢失了(一些分区只有在已经死机的节点上有副本)。
我尝试添加了两个新的Kafka节点,并将分区重新分配给所有4个可用的Kafka节点。最终,丢失的分区未重新分配到两个新的Kafka节点上。客户端无法发布数据到丢失的分区。
英文:
I have 4 Kafka nodes in a cluster, one topic split to 40 partitions and replica count 2. Kafka version is 2.3.1.
How can I recover from the situation when two Kafka nodes die at the same time, it is not possible to start them again and Kafka logs are lost?
I'm sure that I lose some data because some partitions are lost (some partitions have replicas only on the died nodes).
I tried to add two new Kafka nodes and reassign partitions to all 4 available Kafka nodes. Finally, lost partitions are not reassigned to the two new Kafka nodes. Clients cannot publish data that go to lost partitions.
答案1
得分: 5
Kafka会自动恢复失去的分区,只有在那些分区仍然至少有一个之前处于同步状态的存活副本的情况下。否则,必须在代理上启用unclean.leader.election
,将领导者移到不同步的副本。
由于分区只有2个副本,而您失去了2个节点,可能会丢失一些分区。
您可以将2个副本替换为4个副本,以提高可靠性。
两个新增的节点的ID应与之前的节点相同,以能够拉取副本。
英文:
Kafka recovers by himself the losing partitions only if those partitions still have at least one alive replica that was previously in sync. Otherwise unclean.leader.election
must be enabled on the brokers to move the leader to an out of sync replica
Since partitions had only 2 replica and you lost 2 nodes, you might lose some partitions.
You can replace 2 replica by 4 replica to more reliability
The two added nodes should have the same id as the previous ones to be able to pull replica.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论