英文:
Kafka KStream to KStream join | restart performance
问题
我计划将两个主题作为KStreams加入到一个长窗口(约1周)中。假设在这个窗口中会累积数亿条记录,那么重新启动加入的消费者需要多长时间?我之所以提出这个问题,是因为我无法找到关于窗口中有多少记录存储在消费者缓存中的信息。
英文:
I'm planning on joining two topics as KStreams over a long window (~1week). Assuming there will be hundreds of millions of records accumulated in this window, how long will the joining consumer take to restart? I'm asking this because I was unable to find the information regarding how many of the records from the window are stored in the consumer cache.
答案1
得分: 2
默认情况下,缓存在窗口中的数据存储在RocksDB中,即本地磁盘。因此,在重新启动(在同一台机器上)时,无需重新加载数据,因为数据已经可用。
如果您在不同的机器上重新启动,存储的整个内容都需要从Kafka主题重新读取(以确保容错性)。这需要多长时间取决于许多因素,很难估计。但是,您可以注册一个“恢复回调”来监视恢复过程。这应该为您提供了一种运行一些实验以获取有关需要多长时间的洞察的方式。
英文:
By default, data that is buffered in a window is stored in RocksDB, ie, local disk. Hence, on restart (on the same machine) nothing needs to be re-loaded as the data is already available.
If you restart on a different machine, the whole content of the store would need to be re-read from a Kafka topic (that backs up the store to guarantee fault-tolerance). How long this takes depends on many factors and it's hard to estimate. You can register a "restore callback" though to monitor the restore process. This should give you some way to run some experiments to get insight how long it may take.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论