英文:
Few kafka partitions are not getting assigned to any flink consumer
问题
我有一个包含15个分区[0-14]的Kafka主题,我正在以5个并行度运行Flink。所以理论上每个并行的Flink消费者应该消费3个分区。但是,即使在多次重启后,一些Kafka分区仍然没有被任何Flink工作者订阅。
org.apache.kafka.clients.consumer.KafkaConsumer assign 分配到分区:topic_name-13, topic_name-8, topic_name-9
org.apache.kafka.clients.consumer.KafkaConsumer assign 分配到分区:topic_name-11, topic_name-12, topic_name-13
org.apache.kafka.clients.consumer.KafkaConsumer assign 分配到分区:topic_name-14, topic_name-0, topic_name-10
org.apache.kafka.clients.consumer.KafkaConsumer assign 分配到分区:topic_name-5, topic_name-6, topic_name-10
org.apache.kafka.clients.consumer.KafkaConsumer assign 分配到分区:topic_name-2, topic_name-3, topic_name-7
从上面的日志中可以看出,分区10和13已被2个消费者订阅,而分区1和4则完全未被订阅。
注:如果我以1个并行度启动作业,作业完全正常工作。
Flink版本:1.3.3
英文:
I have a kafka topic with 15 partitions [0-14] and I'm running flink with 5 parallelism. So ideally each parallel flink consumer should consume 3 partitions each. But even after multiple restarts, few of the kafka partitions are not subscribed by any flink workers.
org.apache.kafka.clients.consumer.KafkaConsumer assign Subscribed to partition(s): topic_name-13, topic_name-8, topic_name-9
org.apache.kafka.clients.consumer.KafkaConsumer assign Subscribed to partition(s): topic_name-11, topic_name-12, topic_name-13
org.apache.kafka.clients.consumer.KafkaConsumer assign Subscribed to partition(s): topic_name-14, topic_name-0, topic_name-10
org.apache.kafka.clients.consumer.KafkaConsumer assign Subscribed to partition(s): topic_name-5, topic_name-6, topic_name-10
org.apache.kafka.clients.consumer.KafkaConsumer assign Subscribed to partition(s): topic_name-2, topic_name-3, topic_name-7
From the above logs, it shows that partitions 10 and 13 have been subscribed by 2 consumers and partition 1 and 4 are not subscribed at all.
Note: If I start the job with 1 parallelism, the job works perfectly fine.
Flink Version: 1.3.3
答案1
得分: 0
这听起来像是 https://issues.apache.org/jira/browse/FLINK-7143。
阅读 Jira 工单和拉取请求 (https://github.com/apache/flink/pull/4301) 中的详细信息,似乎如果你正在使用 Flink 1.3.x 版本,只有在进行全新重启时才能从此错误修复中受益。从一个保存点重新启动并不能充分享受到这个修复带来的好处。
英文:
This sounds like https://issues.apache.org/jira/browse/FLINK-7143.
Reading through the details in the Jira ticket and in the pull request (https://github.com/apache/flink/pull/4301), it sounds like if you are on Flink 1.3.x you can only benefit from this bug fix if you do a fresh restart. Restarting from a savepoint isn't enough to benefit from the fix.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论