Kafka KStream到KStream的连接 | 重新启动性能

huangapple go评论63阅读模式
英文:

Kafka KStream to KStream join | restart performance

问题

我计划将两个主题作为KStreams加入到一个长窗口(约1周)中。假设在这个窗口中会累积数亿条记录,那么重新启动加入的消费者需要多长时间?我之所以提出这个问题,是因为我无法找到关于窗口中有多少记录存储在消费者缓存中的信息。

英文:

I'm planning on joining two topics as KStreams over a long window (~1week). Assuming there will be hundreds of millions of records accumulated in this window, how long will the joining consumer take to restart? I'm asking this because I was unable to find the information regarding how many of the records from the window are stored in the consumer cache.

答案1

得分: 2

默认情况下,缓存在窗口中的数据存储在RocksDB中,即本地磁盘。因此,在重新启动(在同一台机器上)时,无需重新加载数据,因为数据已经可用。

如果您在不同的机器上重新启动,存储的整个内容都需要从Kafka主题重新读取(以确保容错性)。这需要多长时间取决于许多因素,很难估计。但是,您可以注册一个“恢复回调”来监视恢复过程。这应该为您提供了一种运行一些实验以获取有关需要多长时间的洞察的方式。

英文:

By default, data that is buffered in a window is stored in RocksDB, ie, local disk. Hence, on restart (on the same machine) nothing needs to be re-loaded as the data is already available.

If you restart on a different machine, the whole content of the store would need to be re-read from a Kafka topic (that backs up the store to guarantee fault-tolerance). How long this takes depends on many factors and it's hard to estimate. You can register a "restore callback" though to monitor the restore process. This should give you some way to run some experiments to get insight how long it may take.

huangapple
  • 本文由 发表于 2020年1月3日 19:32:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/59577885.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定