在Cassandra中何时使用UUID而不是毫秒级时间戳?

huangapple go评论63阅读模式
英文:

When to use UUID instead of millisecond timestamp in Cassandra?

问题

Sure, here's the translated content:

我在 Cassandra 中创建了一个表,其中主键是某个带有 timeuuid 数据类型的列。我能够通过仅具有毫秒级精度的时间戳值(以 bigint 存储)来唯一地标识每条记录。

我使用 Java DataStax 驱动程序连接 Cassandra。在将记录插入数据库之前,我将毫秒级时间戳转换为每条记录的 UUID。这是一种额外的开销,可以被移除。

  1. 有人能解释一下使用 timeuuid 而不是 bigint 的好处是什么吗?考虑到记录可以在没有 timeuuid 唯一性的情况下被识别出来。
  2. timeuuidbigint 数据类型之间是否有性能影响?

如果有其他需要,请随时告诉我。

英文:

I have created table in cassandra, where primary key is some column with timeuuid as datatype. I am able to identify each record uniquely with just millisecond precision timestamp value stored as bigint.

I have used java datastax driver to connect cassandra. Before inserting record into database I am converting millisecond timestamp into UUID for each record. Which is overhead and can be removed.

  1. Can some one explain what are the benefits of using timeuuid instead of bigint considering records are able to identified without timeuuid's uniqueness ?
  2. Is there any performance impact in between timeuuid and bigint data type ?

答案1

得分: 2

生成时间戳的 timeuuid 不会对性能产生很大影响。如果在同一毫秒内可能会有许多事件发生,并且您需要排序,则 timeuuid 非常有用。使用 timeuuid,您可以在同一毫秒内获得多达 10,000 个不同的值。典型的用例是具有以下结构的表:

create table tuuid (
  pk int,
  tuuid timeuuid, 
  ....
  ....,
  primary key (pk, tuuid));

在这种情况下,您将获得与 tuuid 的唯一性一起的排序(升序或降序)。当然,您可以使用主键 (pk, timestamp, random-value),但是使用 timeuuid,您无需为唯一性添加额外的列。timeuuid 的一个缺点是与 Spark 集成时可能会遇到问题,因为 Spark 没有这种类型,可能无法执行筛选的推送操作。

如果不需要唯一性,那么可以切换到 timestamp - 它在内部表示为 8 个字节长 - 与 bigint 相同,但您无需自行进行转换等操作。

英文:

There shouldn't be very big impact for performance if you generate timeuuid from timestamp. timeuuid is useful if you may have many events happening in the same millisecond, and you need sorting - with timeuuid you may get up to 10,000 different values inside the millisecond. Typical use case is the table with structure like this:

create table tuuid (
  pk int,
  tuuid timeuuid, 
  ....
  ....,
  primary key (pk, tuiid));

In this case, you will get sorting (ascending or descending) together with uniqueness of values for tuuid. Of course you can come with primary key of (pk, timestamp, random-value), but with timeuuid you don't need to have an additional column for uniqueness. One of the drawback of timeuuid is integration with Spark, for example, as it doesn't have this type, and may not able to perform pushing of the filters.

If you don't need uniqueness, then just switch to timestamp - it's represented as 8-bytes long internally - the same as bigint, but you don't need to do conversions yourself, etc.

huangapple
  • 本文由 发表于 2020年10月15日 22:37:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/64373999.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定