英文:
Cannot get leader of topic "test" partition
问题
我正在执行一个从2.2.1升级到3.2.0的滚动式Kafka经纪人升级(从cp 5.2.2升级到7.2.1)。这是在Kafka集群上进行的滚动升级。有三个Pod,即kafka-0、kafka-1和kafka-2。此主题的复制因子为2。在这个滚动升级过程中,我遇到了停机问题,因为“test”分区没有分配领导者。
无法获取主题test分区4的领导者:kafka服务器:在选举领导者的过程中,当前没有该分区的领导者,因此无法进行写操作
当我添加日志时,我可以看到分区领导者=-1。
我尝试增加一个主题的复制因子,它起作用了,但没有明确的原因为什么会发生这种情况。
英文:
I was performing a rolling kafka broker upgrade from 2.2.1 to 3.2.0(from cp 5.2.2 to 7.2.1).This was done as a rolling upgrade on kafka cluster.There are three pods i.e, kafka-0,kafka-1 and kafka-2.The replication factor for this topic is 2.During this rolling upgrade I am facing a downtime as leader is not assigned for "test" partition.
> Cannot get leader of topic test partition 4: kafka server: In the middle of a leadership election, there is currently no leader for this partition and hence it is unavailable for writes
when I added log I can see Partition leader=-1
I tried increasing the replication factor for one of the topic it worked but no concrete reason why did that happen
答案1
得分: 0
可能您已经设置了min.in.sync.replicas=2
和unclean.leader.election=false
。因此,当当前领导者宕机时,它没有完全复制,所以没有其他副本会取代它。
您有3个代理,所以没有理由不将复制因子设置为3,除了为了节省成本,但在失去任何一个副本时会牺牲可用性。
英文:
Probably you have min.in.sync.replicas=2
, and unclean.leader.election=false
. Therefore, when the current leader went down, it wasn't fully replicated, so no other replica will take its place.
You have 3 brokers, so there's no reason not to have replication factor of 3, other than to save cost, with the tradeoff of unavailability when losing any one replica.
答案2
得分: 0
生产者应该重试,直到选举出新的领袖。确保你的设置包括 acks=all
和 min.insync.replicas=2
,正如 @OneCricketeer 建议的那样。
在升级或代理失败时,你应该期望看到 "NotLeaderException"。这是正常的,只要生产者重试并且上述配置已设置,就不会丢失数据。
英文:
The producer should retry until a new leader is elected. Make sure you have acks= all
and min.insync.replicas=2
as @OneCricketeer suggested.
You should expect to see the "NotLeaderException" during upgrades or broker failures though. It's normal and as long as the producer retries and the above configurations are set there won't be data loss.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论