英文:
Optimal way of disaster tolerance during Kafka message consumption
问题
我目前有一个用于消费Kafka主题消息的服务,进行一些计算,就是这样。我目前的设计是,服务将按批次进行计算(即每批1000条消息),并在完成该批次后发出偏移量,因为延迟不是问题。然而,我意识到,如果我的服务要处理500条消息,然后崩溃然后重新启动,它有可能重新计算这500条消息,因为它还没有向Kafka主题发送偏移量,不知道消费者的位置。我应该如何设计一个流程,以确保仅计算一次,而不必为每条消息都设置偏移量?再次强调,延迟不是问题,但我不想每次都牺牲太多。
英文:
I currently have a service written to consume messages from a Kafka topic, do some computation that's it, I currently have the design where the service will do computation in batches (i.e. 1000 messages per batch) and emit the offset after that batch is done as latency is not a problem. However, I realized that if my service were to process 500 messages, crash then restart, it would potentially re-compute the 500 messages again as it has not sent out an offset to the Kafka topic and it is unaware of where the consumer is at. How should I design a process where I can guarantee exactly once compute without setting the offset every single message? Once again, latency is not a problem but I don't want to sacrifice so much by setting an offset every single time.
答案1
得分: 0
Kafka 可以支持事务处理,所以我会从那里开始。
但如果您不提交 offset % 1000
条记录,而只处理 batch[0..499]
,例如,那么您需要一些下游逻辑,在 Kafka 的范围之外,以防止您再次处理这些记录。例如,使用 Redis 存储一些记录ID,并进行快速的哈希查找,以查看记录是否已被处理或未处理。当然,这是一个故障点,但这是编写不具备幂等性的消费者代码的权衡。
重新启动的 Kafka 消费者将自动倒回到上次提交的偏移量,并重新开始读取,就好像什么都没发生过。
幂等记录的示例 - (id, null)
是删除事件;处理相同的记录不应该执行任何操作,因为该ID已经在您的下游系统中消失了。但是,如果您倒带以查看 (id, data)
,它将尝试再次插入该事件,直到再次看到 (id, null)
为止。
英文:
Kafka can support transactional processing, so I would start with that.
But if you don't commit offset % 1000
records and only process batch[0..499]
, for example, then you need some downstream logic, outside the scope of Kafka, to prevent you from handling those records again. For example, use Redis to store some record ID, and do a fast hash lookup to see if record has been processed or not. Sure, this is a point-of-failure, but this is the tradeoff for writing consumer code that doesn't have idempotency.
A restarted Kafka consumer will automatically rewind to the last committed offset, and start reading again, as if nothing happened.
Example of idempotent record - (id, null)
is a delete event; processing the same should do nothing because that ID would already be gone in your downstream systems. But, if you rewind to see (id, data)
, it would try to upsert that event again until seeing (id, null)
again.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论