2020年10月15日 22:37:29go评论108阅读模式

英文:

When to use UUID instead of millisecond timestamp in Cassandra?

问题

Sure, here's the translated content:

我在 Cassandra 中创建了一个表，其中主键是某个带有 timeuuid 数据类型的列。我能够通过仅具有毫秒级精度的时间戳值（以 bigint 存储）来唯一地标识每条记录。

我使用 Java DataStax 驱动程序连接 Cassandra。在将记录插入数据库之前，我将毫秒级时间戳转换为每条记录的 UUID。这是一种额外的开销，可以被移除。

有人能解释一下使用 timeuuid 而不是 bigint 的好处是什么吗？考虑到记录可以在没有 timeuuid 唯一性的情况下被识别出来。
在 timeuuid 和 bigint 数据类型之间是否有性能影响？

如果有其他需要，请随时告诉我。

英文:

I have created table in cassandra, where primary key is some column with timeuuid as datatype. I am able to identify each record uniquely with just millisecond precision timestamp value stored as bigint.

I have used java datastax driver to connect cassandra. Before inserting record into database I am converting millisecond timestamp into UUID for each record. Which is overhead and can be removed.

Can some one explain what are the benefits of using timeuuid instead of bigint considering records are able to identified without timeuuid's uniqueness ?
Is there any performance impact in between timeuuid and bigint data type ?

答案1

得分: 2

生成时间戳的 timeuuid 不会对性能产生很大影响。如果在同一毫秒内可能会有许多事件发生，并且您需要排序，则 timeuuid 非常有用。使用 timeuuid，您可以在同一毫秒内获得多达 10,000 个不同的值。典型的用例是具有以下结构的表：

create table tuuid (
  pk int,
  tuuid timeuuid, 
  ....
  ....,
  primary key (pk, tuuid));

在这种情况下，您将获得与 tuuid 的唯一性一起的排序（升序或降序）。当然，您可以使用主键 (pk, timestamp, random-value)，但是使用 timeuuid，您无需为唯一性添加额外的列。timeuuid 的一个缺点是与 Spark 集成时可能会遇到问题，因为 Spark 没有这种类型，可能无法执行筛选的推送操作。

如果不需要唯一性，那么可以切换到 timestamp - 它在内部表示为 8 个字节长 - 与 bigint 相同，但您无需自行进行转换等操作。

英文:

There shouldn't be very big impact for performance if you generate timeuuid from timestamp. timeuuid is useful if you may have many events happening in the same millisecond, and you need sorting - with timeuuid you may get up to 10,000 different values inside the millisecond. Typical use case is the table with structure like this:

create table tuuid (
  pk int,
  tuuid timeuuid, 
  ....
  ....,
  primary key (pk, tuiid));

In this case, you will get sorting (ascending or descending) together with uniqueness of values for tuuid. Of course you can come with primary key of (pk, timestamp, random-value), but with timeuuid you don't need to have an additional column for uniqueness. One of the drawback of timeuuid is integration with Spark, for example, as it doesn't have this type, and may not able to perform pushing of the filters.

If you don't need uniqueness, then just switch to timestamp - it's represented as 8-bytes long internally - the same as bigint, but you don't need to do conversions yourself, etc.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Cassandra中何时使用UUID而不是毫秒级时间戳？

问题

答案1

春季Thymeleaf表单提交与静态值

如何使用MouseListener来移动不同形状

Do-while 循环不会回头

在Java中将一个类的对象转换为另一个类的对象。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。