2023年7月11日 14:59:15go评论141阅读模式

英文:

What is the most effective way to retain data for longer in Cassandra

问题

我正在寻找在Cassandra中保留数据更长时间的有效方法。

例如：我想保留最近两年的数据以供实时查询，而在那之前的两年可能不会被查询，但需要出于审计目的保留。

有没有实现这个目标的方法？

期待真实的选项，如果有的话。

版本：Cassandra 3.11 / 4.0

英文:

I'm looking for an effective way to retain data longer in Cassandra.

Ex: I want to retain last 2 years worth of data for real time query purpose and 2 years prior to that would not be queried but need to retain for audit purpose in need.

Is there a way to achieve that?

Looking forward to real options if any.

Version: Cassandra 3.11 / 4.0

答案1

得分: 1

1/ 带有TTL的历史表

设计一个可以按时间键存储数据并为每个写入的行应用4年的TTL的表

出于审计目的，对于超过2年的数据，请在后端API中处理日期以阻止检索

建议的主键+列集合：((年份，月份，日期)，数据ID)

请注意，TTL位于列级别（请参阅此Medium文章）

2/ 带有定期清除的历史表

与前面的建议相同，但使用定期作业而不是TTL来清除大于4年的数据。

为了实现这一点，使用Spark Cassandra Connector从Cassandra中提取历史表，识别要删除的行并在Cassandra中清除数据

我个人建议使用Dataframe API而不是RDDs（官方GitHub项目）

英文:

Without knowing your access pattern (specifically primary key) and write pattern (batch/streaming), below several suggestions to solve your problem :

1/ Historicised table with TTLs

Design a table who can store data by temporal keys and apply a TTL of 4 years for each row written

For audit purposes data (more than 2 years), handle date to block restitution in your back end API

Suggested PK+CC : ((year, month, day) , dataId)

Notice that TTL is at Column Level (See this medium article)

2/ Historicised table with scheduled purge

Same of previous suggestion, but use a scheduled job instead of TTL to purge data greater than 4 year.

To achieve that, use Spark Cassandra Connector to extract historicised table from Cassandra, identify rows to delete and purge data in Cassandra

I personally recommend usage of Dataframe API over RDDs (Official GitHub project)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Cassandra中保留数据更长时间的最有效方法是什么？

问题

答案1

Best way to integrate go with Spark

gocql的SELECT *语句无法返回所有列。

亚马逊Keyspaces是否有替代已登录批处理的选项？

Cassandra InvalidQueryException: Key may not be empty

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论