英文:
Rocksdb seems to load full Ktable statestore in memory
问题
在我的Kafka流拓扑中,我将一个KTable与一个KStream进行了连接,但由于内存问题而不断崩溃:连接DSL为KTable创建了一个状态存储(statestore),当此状态存储较小时,使用的内存较低,但当KTable随着新消息增长时,内存也会增长。在某一点上,状态存储的大小超过了为流分配的内存,因此导致崩溃。
我的问题是:在KTable/KStream连接中,RocksDB是否会将状态存储的所有内容加载到LRU缓存中?这对我来说似乎是一种奇怪的行为,因为在给定时间点上,流仅使用KTable的一小部分,而RocksDB是用于刷新到磁盘的,否则与内存存储无异。
感谢您的帮助。
附注:我已经尝试使用配置设置器限制RocksDB的内存,但效果不大。在某一点上,它会超过配置的缓存大小,或者如果启用严格模式则会引发LRUCache限制异常。
英文:
In my kafka stream topology, I join a ktable with a kstream and it keeps crashing because of the memory: the join DSL creates one statestore for the Ktable, and when this statestore is small, the used memory is low , but when the ktable grows with new messages the memory grows too . At some point, the size of the statestore exceeds the memory alloced for the Stream and so it crashes.
My question is : with a Ktable/Kstream join, does Rocksdb loads all the content of the statestore in the LRUcache? It seems a weird behavior to me, because only a small part of the ktable is used by the stream at a given point, and Rocksdb is made to flush to disk otherwise it makes no difference with the in memory store.
Thanks for your help
P.S : I already tried to bound the rocksdb memory with the config setter, but it doesn't change much. At some point, it exceeds the size of the configured cache size, or throws an LRUCache limit exception if I enable strict mode.
答案1
得分: 1
如果你指的是KafkaStreams的缓存层,那么不会。缓存的大小受statestore.cache.max.bytes
(在较旧的版本中为cache.max.bytes.buffering
)限制。
对于RocksDB的非堆外缓存,你可以实现一个RocksDBConfigSetter
,然后通过StreamsConfig
传递以限制RocksDB使用的内存。请参考https://docs.confluent.io/platform/current/streams/developer-guide/memory-mgmt.html#rocksdb
如果你已经尝试过使用配置setter来限制RocksDB的内存,但效果不大,最终超出了配置的缓存大小,或者如果启用了严格模式则会引发LRUCache限制异常,那么是否确实与RocksDB有关呢?对于这种情况,似乎需要更改你的RocksDBConfigSetter
。我建议检查RocksDB的本地LOG
文件(一个好的工具是https://github.com/speedb-io/log-parser)以查看内存使用情况。这可能与固定(例如,tableConfig.setCacheIndexAndFilterBlocks(true);
)有关 - 如果使用了固定,它将优先于内存限制,并可能违反限制。对于这种情况,你应该禁用固定。
英文:
> with a Ktable/Kstream join, does Rocksdb loads all the content of the statestore in the LRUcache
If you refer to KafkaStreams' caching layer, no. The size of the cache is bound via statestore.cache.max.bytes
(in older releases cache.max.bytes.buffering
).
For RocksDB off-heap caches, you can implement a RocksDBConfigSetter
that you pass via StreamsConfig
to bound the memory RocksDB uses. Cf https://docs.confluent.io/platform/current/streams/developer-guide/memory-mgmt.html#rocksdb
> P.S : I already tried to bound the rocksdb memory with the config setter, but it doesn't change much. At some point, it exceeds the size of the configured cache size, or throws an LRUCache limit exception if I enable strict mode.
So it's indeed about RocksDB? For this case, it seems your RocksDBConfigSetter
need to be changed. I would recommend to inspect RocksDBs local LOG
file (a good tool is https://github.com/speedb-io/log-parser) to see where the memory usage goes. It could be related to pinning (eg, tableConfig.setCacheIndexAndFilterBlocks(true);
) -- if pinning is used it has priority over the memory limit and might violate the limit. For this case, you should disable pinning.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论