英文:
Kafka "Exactly Once" semantics and max-poll-records
问题
I'm kind of confused about max.poll.records config in Apache Kafka Consumer regarding "Exactly Once" semantics in Apache Kafka.
Based on my research the "Exactly Once" semantics has nothing to do with max.poll.records config.
But if I set to max.poll.records=1, would it "reduce" the probability of possible duplication and the number of duplication?
英文:
I'm kind of confused about max.poll.records config in Apache Kafka Consumer regarding "Exactly Once" semantics in Apache Kafka.
Based on my research the "Exactly Once" semantics has nothing to do with max.poll.records config.
But if I set to max.poll.records=1, would it "reduce" the probability of possible duplication and the number of duplication?
答案1
得分: 0
不会,这不会有任何区别,因为"仅一次"语义主要用于控制发送到Kafka的数据的事务行为。
由于Kafka的分布式特性,它最初支持至少一次(在重试情况下)和至多一次(无重试)语义。
以下是来自kafka文档的简要定义:
至多一次 — 消息可能会丢失,但不会重新传递。
至少一次 — 消息不会丢失,但可能会重新传递。
仅一次 — 这才是人们真正想要的,每条消息只传递一次。
话虽如此,从消费者的角度来看,无论它轮询了1条还是多条记录,这并不保证这些记录是否为重复记录,这完全取决于生产者在交付保证方面考虑了什么。
我强烈建议您阅读以下文章,这将更加清楚:
英文:
No, this wouldn't make any difference, as the exactly once semantic mainly to control the transactional behavior of the data being sent to kafka.
Due to the distributed nature of kafka, it was originally supporting at-least-once (in case retrials) and at-most-once (no retrials) semantic.
Below is the very nutshell definition from kafka docs
>At most once—Messages may be lost but are never redelivered.
>At least once—Messages are never lost but may be redelivered.
>Exactly once—this is what people actually want, each message is delivered once and only once.
Being that said, from the consumer side, whether it will poll 1 or more records, this doesn't guarantee whether these records are duplicated records or not, it totally depends on the delivery guarantees you consider in the producer.
I highly recommend you to go through the below articles, will make things much more clear
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论