英文:
Will Kafka Streams guarentee at-least once processing in stateful processors even when Eaxctly-once is disabled?
问题
这个问题是因为我们正在运行Kafka Streams应用程序,由于基础设施的限制,没有启用EOS,所以我有了这个疑问。我们不确定在使用转换器/处理器API与更改记录状态存储执行一些自定义逻辑时它的行为。
假设我们在将记录发送到下游之前使用以下拓扑来去重记录:
[主题] -> [flatTransformValues + 状态存储] -> [...(下游)]
这里的转换器将传入的记录与状态存储进行比较,仅在值发生变化时转发+更新记录,因此对于消息 [A:1], [A:1], [A:2]
,我们期望下游只会收到 [A:1], [A:2]
。
问题是,当发生故障时,是否有可能将 [A:2]
存储在状态存储的更改记录中,而下游却没有收到该消息,以至于任何重试读取 [A:2]
的操作都会丢弃该记录并丢失?
如果不是这样,请告诉我是否有任何机制可以防止这种情况发生,我认为一种可能的工作方式是,如果Kafka流只在向下游生产后才生产到更改日志主题并提交偏移量,是否可以防止这种情况发生?
非常感谢!
英文:
This question comes in mind as we are running kafka streams applications without EOS enabled due to infra constraints. We are unsure of its behavior when doing some custom logic using transformer/processor API with changeloged state stores .
Say we are using following topology to de-duplicate records before sending to downstream:
[topic] -> [flatTransformValues + state store] -> [...(downstream)]
the transformer here will compare incoming records against the state store and only forward + update the record when there's a value change, so for messages [A:1], [A:1], [A:2]
, we expect downstream will only get [A:1], [A:2]
Question is when failures happens, is it possible that [A:2]
get stored in the state store's changelog, while downstream does not receive the message, so that any retry reading [A:2]
will discard the record and its lost forever?
If not, please tell me if any mechanism prevent this happening, one way i think it could work is if kafka stream produce to changelog topics and commit offsets only after produce to downstream succeeds?
Much appreciated!
答案1
得分: 0
是的,这是可能的。至少一次只能保证事件会被重新读取和重新处理。但对于你的情况,修改后的状态会修改第二次处理并将事件检测为重复事件。
最终,无论如何都没有意义,编写去重程序并不使用确切一次的保证。即使你能够防止你描述的情况发生,使用至少一次处理本身也可能引入重复事件...
英文:
> Question is when failures happens, is it possible that [A:2] get stored in the state store's changelog, while downstream does not receive the message, so that any retry reading [A:2] will discard the record and its lost forever?
Yes, that's possible. At-least-once only guarantees that the event will be re-read and re-processed. But for your case, the modified state would modify the second processing and detect the event as a duplicate.
In the end, it does not make sense anyway, to write a de-duplication program and not use exactly-once guarantees anyways. Even if you could prevent the scenario you describe, using at-least-once processing could introduce duplicates by itself...
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论