问题

这种模式在另一个集群上看不到，该集群的保留设置（默认）也为7天。

有问题的集群是v2.8.1，并设置了以下配置（其他所有配置均保持默认）：

auto.create.topics.enable = false
delete.topic.enable = true
num.partitions = 10

下面是CloudWatch的度量图，作为标题中提到的7天模式的示例。

上面绘制的度量KafkaDataLogsDiskUsed被定义为如下：

用于数据日志的磁盘空间使用百分比。

“数据日志”是否包括Kafka自身/内部日志？因为上述图表的一个解释可能是每7天有一些内部/簿记过程一次性填充“某些”日志（然后在另外7天或retention.ms设置的任何时间后几乎同时丢弃），这种模式在这个小型集群上更加明显，因为没有太多“真实”数据流入。

英文:

Such a pattern isn't seen on another cluster, which also has the (default) retention set to 7 days.

The cluster in question is v2.8.1 and has these configurations set (everything else is left to defaults):

auto.create.topics.enable = false
delete.topic.enable = true
num.partitions = 10

Below is a metric graph from CloudWatch, as an example of the 7 days pattern mentioned in the title.

The metric KafkaDataLogsDiskUsed graphed above is defined as:
> The percentage of disk space used for data logs.

Does "data logs" entail Kafka's own/internal logs? Because one explanation of the above graphs would be that there's some internal/bookkeeping process which fills in "some" logs every 7 days, at once (and which then get also almost simultaneously discarded after another 7 days, or whatever retention.ms is set to). And this pattern is more visible on this, toy cluster with not much "real" data flowing in.

答案1

得分: 3

正如您提到的，所有Kafka主题的默认保留期限是7天，更重要的是默认的log.cleanup.policy设置为delete（而不是compact）。由于所有Kafka主题的日志都存储在磁盘上，看到您所看到的趋势并不令人意外。

一旦达到默认保留期限（log.retention.ms），它最终会触发一个清理过程来运行并删除所有比默认保留期限旧的段，由于这些段存储在磁盘上，会反映在您的图表中。

如果您需要更多详细信息，可以参考这篇博客文章。

英文:

As you mentioned the default retention for all Kafka topics is 7 days and more importantly is the default log.cleanup.policy which is set to delete (as opposed to compact). Since logs for all of your Kafka topics are stored on disk, it's not surprising to see the trend you are seeing.

Once your default retention threshold (log.retention.ms) has been met, it'll eventually trigger a cleanup process to run and delete all of the segments that are older than the default retention, which since these are stored on disk, would be reflected in your graph.

This blog post does an excellent job going over the retention and clean-up process, if you need more detail.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Kafka的磁盘使用量为什么每7天会周期性下降？

问题

答案1

能在一个Kafka集群中运行两个Debezium连接器吗？

Could not determine output schema for query due to error: Can't find any functions with the name 'OUNT_DISTINCT'

Kafka代理可以通过多个端口进行连接吗？

Kafka每秒最大吞吐量（以消息为单位）

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论