2023年3月8日 18:37:21go评论122阅读模式

英文:

Multi instance microservice process a unique Kafka message more than one time when Kafka rebalance the partitions

问题

在生产环境中，我们有一个包含两个实例的微服务，它们从同一个主题中消费，但使用相同的组 ID（不同的客户端 ID）。
因此，如果我们有5个分区，第一个实例会接管3个分区，而另一个实例会接管2个分区。

但是在生产环境中部署时，

我们关闭第一个实例。
然后第一个实例重新启动。
然后我们部署第二个实例并关闭它。
最后第二个实例重新启动。

但是这个过程会导致Kafka在实例关闭时重新平衡分区。

问题是在此之后，我们会将来自Kafka分区的唯一消息处理两次（我检查了Kafka分区中没有重复消息，消息是唯一的）。
我认为在重新平衡和处理期间，一个实例无法成功提交一些消息，另一个实例会再次处理它。请注意，我们已经设置了

AutoOffsetReset = AutoOffsetReset.Earliest,
EnableAutoCommit = false,

结果是来自Kafka的唯一消息会被处理两次。
如何解决这个问题？

此外，当我们从Kafka中读取消息后，我们会将它们存储在数据库中，然后提交这些消息到Kafka。
我想知道如何解决这个问题...

如果我使用“事务性消费”，是否可以解决这个问题？
如果两个实例同时使用相同的消息开始事务，会发生什么？

例如，
实例A在t1时间戳开始事务，带有id 1,2,3,4,5。
实例B在t2时间戳（比t1晚一些毫秒）开始事务，带有id 4,5,6,7,8,9。

对于相同的id 4,5的消息会发生什么？

此外，我认为如果我使用例如分布式Redis缓存并检查这些id，这是否是一个不错的选择。

英文:

In production, we have a microservice with two instances and consume from the same topic but with the same group id(and different clientId).
As a result, if we have 5 partitions, the first instance takes three and the other one takes two.

But when we deploy in production,

We shut down the first instance.
Then the first instance is up and running
Then we deploy the second instance and shut down it,
Finally and the second instance is up and running.

But this process makes Kafka rebalance the partitions when an instance is shut down

The problem is after that we process a unique message from the Kafka partition twice(I check there is no duplicate in Kafka partitions and the message is unique).
I think so during rebalancing and process time an instance does not achieve to commit some messages and the other instance processes it again. Note we have set

AutoOffsetReset = AutoOffsetReset.Earliest,
EnableAutoCommit = false,

The result is a unique message from Kafka we process it twice.
How can resolve this issue?

Also when we read messages from Kafka we store them in a database and then commit the messages into Kafka.
I am curious how can resolve this issue...

If I use transactional consume could I resolve this?
And what about if two instances begin transactions with the same messages?

For example
Instance A begins the transaction at t1 timestamp with ids 1,2,3,4 ,5
Instance B begins the transaction at the t2 timestamp(some ms after t1) with ids 4,5,6,7,8,9.

What happened with the messages with the same ids 4,5??

Also, I thought if it is a good choice can handle it by me using for example distributing redis cache and checking the ids.

答案1

得分: 2

通过使您的处理具有幂等性来处理重复读取。例如，在数据库中检查是否已经存储了数据，如果是，则丢弃记录。
自己处理偏移量提交。在Java客户端的KafkaConsumer类的Storing Offsets Outside Kafka部分中有一个很好的解释。基本上，它允许您原子地处理偏移量和实际数据存储到数据库中。请注意，您需要自己处理重新平衡事件，这里有一个如何做的示例：ConsumerRebalanceListener。

看起来您正在使用.NET客户端，所以等效的方法是使用ConsumerBuilder#SetPartitionsRevokedHandler。

英文:

You have two options:

Deal with the duplicate reads by making your processing idempotent. Eg check in your database whether you've already stored the data and drop the record if so.
Handle the offset commits yourself. There's a good explanation in the Storing Offsets Outside Kafka section of the KafkaConsumer class of the Java Client. Basically, it will allow you to transact atomically both the offset and the actual data in your database. Note that you'll need to handle the rebalance events yourself, there's an example of how to do that here: ConsumerRebalanceListener.

It seems that you're using the .NET client, so the equivalent would be to use ConsumerBuilder#SetPartitionsRevokedHandler.

答案2

得分: 0

只返回翻译好的部分：

至少需要在自行处理偏移提交时期望进行一次处理，但仅在成功处理记录后才进行提交。具体来说，如果您的消费者在拉取消息后但在提交之前重新平衡，它将不得不在重新平衡时寻找到最后提交的偏移量。

是的，事务可以帮助，但仅限于一个消费者会话。如果您重新平衡到一个全新的实例而不进行提交，整个事务将再次被消耗。您需要将此逻辑与自己的数据库事务处理结合起来。

解决此问题的方法是要么更频繁地提交，要么将已处理的值集中存储在高度可用的数据存储中。但除非您希望Redis更快，否则没有添加Redis的理由（从此数据存储中查找每个事件可能导致更高的消费者延迟和网络IO，从而增加了消费者重新平衡的可能性，因此您需要增加轮询超时配置）。

英文:

At least once processing is expected if you're handling offset commits on your own, but only committing after you've successfully processed any record. Specifically, if your consumer rebalances between process post-poll, but before committing, it'll have to seek back to the last committed offset on rebalance.

Yes, transactions can help, but only within one consumer session. If you rebalance to an entirely new instance without committing, the whole transaction will be consumed again. You need to combine this logic in your own database transaction handling.

The way to workaround this is to either commit more frequently, or centrally store values that have been processed in a highly available datastore. But there's no reason to add Redis when you already have a database, unless you expect Redis to be faster. (Looking up from this datastore for every event may cause higher consumer lag and network IO, further increasing likelihood consumers will rebalance, so you'll need to increase poll timeout configuration)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

多实例微服务在Kafka重新分配分区时会多次处理唯一的Kafka消息。

问题

答案1

答案2

如何将Kafka Consumer对象传递给将在Python中提交给Dask的函数？

将Dataset<Row>转换为要添加到Kafka的键和值。

Kafka生产者失败 – 意外的错误代码：87

如何在处理来自Kafka的记录时将字节转换为字符串？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论