2023年2月20日 00:42:27go评论81阅读模式

英文:

Will Kafka Streams guarentee at-least once processing in stateful processors even when Eaxctly-once is disabled?

问题

这个问题是因为我们正在运行Kafka Streams应用程序，由于基础设施的限制，没有启用EOS，所以我有了这个疑问。我们不确定在使用转换器/处理器API与更改记录状态存储执行一些自定义逻辑时它的行为。

假设我们在将记录发送到下游之前使用以下拓扑来去重记录：

[主题] -> [flatTransformValues + 状态存储] -> [...(下游)]

这里的转换器将传入的记录与状态存储进行比较，仅在值发生变化时转发+更新记录，因此对于消息 [A:1], [A:1], [A:2]，我们期望下游只会收到 [A:1], [A:2]。

问题是，当发生故障时，是否有可能将 [A:2] 存储在状态存储的更改记录中，而下游却没有收到该消息，以至于任何重试读取 [A:2] 的操作都会丢弃该记录并丢失？

如果不是这样，请告诉我是否有任何机制可以防止这种情况发生，我认为一种可能的工作方式是，如果Kafka流只在向下游生产后才生产到更改日志主题并提交偏移量，是否可以防止这种情况发生？

非常感谢！

英文:

This question comes in mind as we are running kafka streams applications without EOS enabled due to infra constraints. We are unsure of its behavior when doing some custom logic using transformer/processor API with changeloged state stores .

Say we are using following topology to de-duplicate records before sending to downstream:

[topic] -&gt; [flatTransformValues + state store] -&gt; [...(downstream)]

the transformer here will compare incoming records against the state store and only forward + update the record when there's a value change, so for messages [A:1], [A:1], [A:2], we expect downstream will only get [A:1], [A:2]

Question is when failures happens, is it possible that [A:2] get stored in the state store's changelog, while downstream does not receive the message, so that any retry reading [A:2] will discard the record and its lost forever?

If not, please tell me if any mechanism prevent this happening, one way i think it could work is if kafka stream produce to changelog topics and commit offsets only after produce to downstream succeeds?

Much appreciated!

答案1

得分: 0

是的，这是可能的。至少一次只能保证事件会被重新读取和重新处理。但对于你的情况，修改后的状态会修改第二次处理并将事件检测为重复事件。

最终，无论如何都没有意义，编写去重程序并不使用确切一次的保证。即使你能够防止你描述的情况发生，使用至少一次处理本身也可能引入重复事件...

英文:

> Question is when failures happens, is it possible that [A:2] get stored in the state store's changelog, while downstream does not receive the message, so that any retry reading [A:2] will discard the record and its lost forever?

Yes, that's possible. At-least-once only guarantees that the event will be re-read and re-processed. But for your case, the modified state would modify the second processing and detect the event as a duplicate.

In the end, it does not make sense anyway, to write a de-duplication program and not use exactly-once guarantees anyways. Even if you could prevent the scenario you describe, using at-least-once processing could introduce duplicates by itself...

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Will Kafka Streams guarentee at-least once processing in stateful processors even when Eaxctly-once is disabled?

问题

答案1

Kafka显示：「在实际加入消费者组之前，组成员需要具有有效的成员ID」。

使用ksql从一个主题创建一张带有where子句的表格。

Custom compaction for Kafka topic on the broker side? 在代理端自定义Kafka主题压缩？

KStreams – 如何处理一个主题上的消息延迟

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。