英文:
What is the most effective way to retain data for longer in Cassandra
问题
我正在寻找在Cassandra中保留数据更长时间的有效方法。
例如:我想保留最近两年的数据以供实时查询,而在那之前的两年可能不会被查询,但需要出于审计目的保留。
- 有没有实现这个目标的方法?
期待真实的选项,如果有的话。
版本:Cassandra 3.11 / 4.0
英文:
I'm looking for an effective way to retain data longer in Cassandra.
Ex: I want to retain last 2 years worth of data for real time query purpose and 2 years prior to that would not be queried but need to retain for audit purpose in need.
- Is there a way to achieve that?
Looking forward to real options if any.
Version: Cassandra 3.11 / 4.0
答案1
得分: 1
1/ 带有TTL的历史表
设计一个可以按时间键存储数据并为每个写入的行应用4年的TTL的表
出于审计目的,对于超过2年的数据,请在后端API中处理日期以阻止检索
建议的主键+列集合:((年份,月份,日期),数据ID)
请注意,TTL位于列级别(请参阅此Medium文章)
2/ 带有定期清除的历史表
与前面的建议相同,但使用定期作业而不是TTL来清除大于4年的数据。
为了实现这一点,使用Spark Cassandra Connector从Cassandra中提取历史表,识别要删除的行并在Cassandra中清除数据
我个人建议使用Dataframe API而不是RDDs(官方GitHub项目)
英文:
Without knowing your access pattern (specifically primary key) and write pattern (batch/streaming), below several suggestions to solve your problem :
1/ Historicised table with TTLs
Design a table who can store data by temporal keys and apply a TTL of 4 years for each row written
For audit purposes data (more than 2 years), handle date to block restitution in your back end API
Suggested PK+CC : ((year, month, day) , dataId)
Notice that TTL is at Column Level (See this medium article)
2/ Historicised table with scheduled purge
Same of previous suggestion, but use a scheduled job instead of TTL to purge data greater than 4 year.
To achieve that, use Spark Cassandra Connector to extract historicised table from Cassandra, identify rows to delete and purge data in Cassandra
I personally recommend usage of Dataframe API over RDDs (Official GitHub project)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论