英文:
kafka __consumer_offsets disk usage is huge and asymmetric
问题
我想知道是什么原因导致了这个问题以及如何修复。kafka的__consumer_offsets
磁盘使用量非常大(134 GB),且不对称(主要在broker 3上,大部分是单个分区)。复制因子为3,有3个broker,所以至少期望对称性,尽管我更担心减小大小。
MSK版本为2.8.1,命令行使用的是confluent 6.2.10。
$ kafka-log-dirs --describe --bootstrap-server $BOOTSTRAP --topic-list __consumer_offsets | grep '{' | jq -r '.brokers[] | ["broker", .broker, "=", (([.logDirs[].partitions[].size] | add // 0) | . / 10000 | round | ./ 100), "MB" ] | @tsv' | paste -sd , | tr '\t' ' '
broker 1 = 459.72 MB,broker 2 = 218.95 MB,broker 3 = 134346.48 MB
$ kafka-log-dirs --describe --bootstrap-server $BOOTSTRAP --topic-list __consumer_offsets | grep '{' | jq -r '.brokers[] | ["broker", .broker, "=", (.logDirs[].partitions[].size / 1000000 | round)] | @tsv' | tr '\t' ' '
broker 1 = 52 1 0 0 1 0 0 243 102 0 2 0 3 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 47 4 0 0 0 0 0 0 5 0 0 0 0 1 2 3 1 0 0 0
broker 2 = 52 1 0 0 1 0 0 2 102 0 2 0 3 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 47 4 0 0 0 0 0 0 5 0 0 0 0 1 2 3 1 0 0 0
broker 3 = 133907 1 0 0 8 3 1 31 10 4 2 0 27 0 2 0 14 8 4 4 1 0 3 2 0 10 0 0 3 14 35 123 0 0 2 0 0 0 23 0 0 0 0 25 26 39 9 3 6 5
$ kafka-topics --bootstrap-server $BOOTSTRAP --describe --topic __consumer_offsets
Topic: __consumer_offsets TopicId: ... PartitionCount: 50 ReplicationFactor: 3 Configs: compression.type=producer,min.insync.replicas=2,cleanup.policy=compact,segment.bytes=104857600,message.format.version=2.8-IV1,max.message.bytes=10485880,unclean.leader.election.enable=true
...
英文:
I'm wonder what could cause this and how to fix.
kafka __consumer_offsets
disk usage is huge (134 GB)
and asymmetric (mostly on broker 3, and mostly a single partition).
ReplicationFactor=3 and there are 3 brokers so I would at least expect symmetry,
although I am more concerned about reducing the size.
MSK 2.8.1 and confluent 6.2.10 for the command-line.
$ kafka-log-dirs --describe --bootstrap-server $BOOTSTRAP --topic-list __consumer_offsets | grep '^{' | jq -r '.brokers[] | ["broker", .broker, "=", (([.logDirs[].partitions[].size] | add // 0) | . / 10000 | round | ./ 100), "MB" ] | @tsv' | paste -sd , | tr '\t' ' '
broker 1 = 459.72 MB,broker 2 = 218.95 MB,broker 3 = 134346.48 MB
$ kafka-log-dirs --describe --bootstrap-server $BOOTSTRAP --topic-list __consumer_offsets | grep '^{' | jq -r '.brokers[] | ["broker", .broker, "=", (.logDirs[].partitions[].size / 1000000 | round)] | @tsv' | tr '\t' ' '
broker 1 = 52 1 0 0 1 0 0 243 102 0 2 0 3 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 47 4 0 0 0 0 0 0 5 0 0 0 0 1 2 3 1 0 0 0
broker 2 = 52 1 0 0 1 0 0 2 102 0 2 0 3 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 47 4 0 0 0 0 0 0 5 0 0 0 0 1 2 3 1 0 0 0
broker 3 = 133907 1 0 0 8 3 1 31 10 4 2 0 27 0 2 0 14 8 4 4 1 0 3 2 0 10 0 0 3 14 35 123 0 0 2 0 0 0 23 0 0 0 0 25 26 39 9 3 6 5
$ kafka-topics --bootstrap-server $BOOTSTRAP --describe --topic __consumer_offsets
Topic: __consumer_offsets TopicId: ... PartitionCount: 50 ReplicationFactor: 3 Configs: compression.type=producer,min.insync.replicas=2,cleanup.policy=compact,segment.bytes=104857600,message.format.version=2.8-IV1,max.message.bytes=10485880,unclean.leader.election.enable=true
...
答案1
得分: 1
这里真的没有太多可以做的了,这一点上。
tl;dr 您有将消费者组名称散列到相同分区,或者有一个非常大的消费者组,它频繁提交。
- 主题是紧凑的,因此数据会保留。频繁的消费者提交可能比紧凑发生得更快,导致该分区迅速增长。
- 您可以消费该主题以进行检查(确保添加
--property print.key=true
),您会注意到键是根据您的消费者代码中设置的group.id
。 - 如果更改您的消费者的
group.id
(假设您可以追踪到它们),那么它们将失去已经消耗的数据的任何现有偏移量,除非进行一些手动干预,使用seek
和commitSync
消费者 API 的组合来迁移偏移量;没有内置脚本可以为您执行此操作。
如果最终“移动”该分区或日志的部分,例如,那么它将开始导致消费者出现错误,因为经纪人仍将尝试使用“分区 X” 来获取/提交偏移量。
实际上,人们经常有一些与 ACL 策略和预期使用配额相关的消费者/主题入职表单,以及那个“Kafka 入职流程<sup>TM</sup>”,可以强制执行特定的消费者组命名约定。
英文:
There's not much that can really be done with this, at this point.
tl;dr You have consumer groups names that are hashed to the same partition, or you have one really large consumer group that does very frequent commits.
- The topic is compact, so data stays around. Frequent consumer commits can happen faster than compaction happens, causing that partition to grow quickly.
- You can consume that topic to inspect it (make sure you add
--property print.key=true
), you will notice that the keys are by thegroup.id
set in your consumer code bases. - If you change
group.id
for your consumers (assuming you can track them down), then they will all lose any existing offsets for the data they've already consumed without some manual intervention to migrate offsets using a combination ofseek
andcommitSync
consumer API; there is no built-in script to do this for you.
If you end up "moving" that partition, or segments of the logs, for example, then it'll start causing errors in consumers since the broker will still try to use "partition X" to fetch/commit offsets.
In practice, people often have some consumer / topic onboarding form, associated with ACL policies and expected usage quotas, and that "Kafka onboarding process<sup>TM</sup> " can enforce specific consumer group naming conventions.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论