可以估算表中墓碑的百分比吗?

huangapple go评论62阅读模式
英文:

Is it possible to estimate the percentage of tombstones in a table?

问题

我目前正在编写一个脚本,以根据阈值百分比计算目标表中墓碑的百分比,并希望提供用户运行压缩的选项。
那么,是否可能估算表中墓碑的百分比,以及如何实现呢?
在我的研究中,我发现了一个名为system.compaction_history的表,其中包含table_name、bytes_in和bytes_out列,我可以根据这些信息做出决策,但不确定我的逻辑是否正确。

提前感谢您的帮助。

致敬,
Jay

英文:

I am currently working on a script to calculate percentage of tombstones in target table, based on the threshold percentage, I want to provide user option to run compaction.
So is it possible to estimate the percentage of tombstone on a table and how is that possible ?
During my research I found system.compaction_history table with table_name, bytes_in and bytes_out columns with which I can make decisions but not sure my logic is correct or not.

Thanks for help in advance.

Regards,
Jay

答案1

得分: 2

使用sstablemetadata代替。您将获得列的一个值:

估计可删除墓碑:0.9188263888888889

但是,您必须小心,因为这不是可删除的百分比。这是对SSTable中可删除墓碑与不可删除列的比率的估算。它表示在压实期间可以删除的墓碑与非可删除列的总数之间的比例。

英文:

Use sstablemetadata instead. You'll get a value for the column:

Estimated droppable tombstones: 0.9188263888888889

But, you have to be careful with this because it's not a percentage of droppable. It's an estimation of the ratio of droppable tombstones to non-droppable columns within an SSTable. It indicates the proportion of tombstones that can be removed during compaction, relative to the total number of non-droppable columns present.

答案2

得分: 0

确定墓碑的百分比需要进行完整的表扫描,在Cassandra中这是一个可怕的想法,因为这种操作不可扩展。

SSTable工具,比如sstablemetadata,可以提供可丢弃墓碑的估算比率,但这并不等同于获取墓碑的数量。无论如何,它提供的比率都是一个非常粗略的估算,因为用于计算比率的算法使用了估算的列计数,而不是分区或行中实际的列数,正如我在回答DBA Stack Exchange上的这个问题中所解释的那样。

更重要的是,你没有问的问题的答案是手动触发主要合并从来都不是一个好主意。在大多数情况下,它会引起比你试图解决的问题还要多的问题。

我已经在我的帖子中解释了为什么主要合并是一个坏主意。相反,你需要做的是解决潜在的根本原因。干杯!

英文:

Determining the percentage tombstones requires a full table scan which is a terrible idea in Cassandra since such operation doesn't scale.

SSTable tools such as sstablemetadata can provide an estimate ratio of droppable tombstones but it's not the same as getting the number of tombstones. In any case, the ratio it provides is a very rough estimate since the algorithm used to calculate the ratio uses estimated column counts, not the actual number of columns in the partitions or row as I've explained in my response to this question on DBA Stack Exchange.

More importantly, the answer to the question you didn't ask is that it is never a good idea to manually trigger a major compaction. In most cases, it will cause more problems than you're trying to solve.

I've explained this in my post on why major compactions are a bad idea. What you need to do instead is address the underlying root cause. Cheers!

huangapple
  • 本文由 发表于 2023年7月13日 17:36:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/76677951.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定