英文:
Kafka message delivery semantic
问题
我正在阅读有关消费者的Kafka文档,遇到了以下有关消息消费定义的内容:
我们的主题被划分为一组完全有序的分区,每个分区在任何给定时间内都由订阅的消费者组中的一个消费者进行消费。这意味着每个分区中的消费者位置只是一个单独的整数,即下一个要消费的消息的偏移量。
我对这些措辞的理解如下:
消费者组从由多个分区组成的主题中读取数据。然后,组中的每个消费者被分配一些不与组中其他消费者的分区重叠的分区子集。
考虑以下情况:
由2个消费者C1
和C2
组成的消费者组GRP
从由2个分区P1
和P2
组成的主题TPC
中读取数据。
问题: 如果某个时刻C1
从P1
读取,而C2
从P2
读取,是否可以重新平衡,使得C1
开始从P2
读取,而C2
从P1
读取?如果可以,在什么条件下可能发生?
这与上述引用并不矛盾。
英文:
I'm reading Kafka documentation about consumers and faced the following message consumption definition:
> Our topic is divided into a set of totally ordered partitions, each of
> which is consumed by exactly one consumer within each subscribing
> consumer group at any given time. This means that the position of
> a consumer in each partition is just a single integer, the offset
> of the next message to consume.
I interpreted the wording as follows:
A consumer group reads data from a topic consisting of a number of partitions. Then each consumer from the group is assigned with some subset of partitions that do not overlap with other consumer's partitions from the group.
Consider the following case:
A consumer group GRP
consisting of 2 consumers C1
and C2
reads data from a topic TPC
consisting of 2 partitions P1
and P2
.
QUESTION: If at some point C1
reads from P1
and C2
reads from P2
can it be rebalanced so that C1
starts reading from P2
and C2
from P1
. If so under which condition may that happen?
It does not contradict to the quote above.
答案1
得分: 1
-
你对所引用段落的解释是正确的。
-
关于问题“如果是这样,什么条件下可能会发生?”:
是的,这种情况是可能发生的。消费者分配到一个TopicPartition的变化主要是通过重新平衡触发的。消费者重新平衡将在以下情况下触发:
当消费者重新平衡被启动时,
-
某个消费者离开了消费者组(要么是因为未能及时发送心跳,要么是明确请求离开)
-
有新的消费者加入消费者组
-
消费者改变了其订阅的Topic
-
消费者组注意到任何已订阅Topic的Topic元数据发生了变化
(例如,分区数量增加)
【来源:Confluent Kafka开发者培训资料】
请记住,在重新平衡期间,所有消费者都将被暂停。
- 关于您的评论:“C1从P1中读取了一些消息,但没有提交偏移量。然后它与Kafka失去连接并成功处理了该消息。与此同时,另一个消费者C3被创建并分配给P1,读取相同的消息。”
我认为这种情况与消费者重新平衡无关,因为您的消费者C1在处理数据后可能会突然关闭,但在将数据提交回Kafka之前会失败。现在,如果您重新启动消费者C1,它将再次读取相同的消息,因为它尚未提交这些消息。
这被称为“至少一次”交付语义,与启用了自动提交的“至多一次”语义不同。我猜您正在寻找分布式系统中的“圣杯”,即“仅一次语义”
要实现这一点,您需要从Kafka到应用程序的终点考虑整个应用程序。如果您的应用程序的输出不是幂等的,那么您可能无法实现仅一次语义(EOS)。但是,如果您的输出终点例如再次是Kafka,您实际上可以实现EOS。
英文:
I see a few things to be discussed in your question and comment.
-
Your interpretation of the quoted paragraph is correct.
-
Question "If so under which condition may that happen?":
Yes, this scenario can happen. A change in the assignment of a consumer to a TopicPartition is mainly triggered through a rebalancing. A consumer rebalance will be triggered in the following cases:
Consumer rebalances are initiated when
-
A Consumer leaves the Consumer group (either by failing to send a timely heartbeat or by explicitly requesting to leave)
-
A new Consumer joins the Consumer Group
-
A Consumer changes its Topic subscription
-
The Consumer Group notices a change to the Topic metadata for any subscribed Topic
(e.g. an increase in the number of Partitions)
[Source: Training Material of Confluent Kafka Developer]
Keep in mind, that during a Rebalance all consumers are paused.
- Your comment "C1 read some message from P1 without commiting offset. Then it loses the connection to Kafka and processes the message succesfully. At the same time the other consumer C3 is created and assigned to the P1 reading the same message."
I see this scenario unrelated to a consumer rebalance, as your consumer C1 could just die after processing the data but before committing the back to Kafka. Now, if you restart the consumer C1 it will read the same messages again because it did not yet commit them.
This is called "at-least-once" delivery semantics and is different to "at-most-once" semantics when you have e.g. auto.commit enabled. I guess you are looking for the "holy grail" in distributed systems which is "exactly-once-semantics"
For this to achieve you need to consider the entire application from Kafka to the sink of your application. If the output of your application is not idempotent you are likely not able to achieve exactly-once semantics (EOS). But if your output sink e.g. is Kafka again you actually can achieve EOS.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论