英文:
When to use UUID instead of millisecond timestamp in Cassandra?
问题
Sure, here's the translated content:
我在 Cassandra 中创建了一个表,其中主键是某个带有 timeuuid
数据类型的列。我能够通过仅具有毫秒级精度的时间戳值(以 bigint
存储)来唯一地标识每条记录。
我使用 Java DataStax 驱动程序连接 Cassandra。在将记录插入数据库之前,我将毫秒级时间戳转换为每条记录的 UUID。这是一种额外的开销,可以被移除。
- 有人能解释一下使用
timeuuid
而不是bigint
的好处是什么吗?考虑到记录可以在没有timeuuid
唯一性的情况下被识别出来。 - 在
timeuuid
和bigint
数据类型之间是否有性能影响?
如果有其他需要,请随时告诉我。
英文:
I have created table in cassandra, where primary key is some column with timeuuid
as datatype. I am able to identify each record uniquely with just millisecond precision timestamp value stored as bigint
.
I have used java datastax driver to connect cassandra. Before inserting record into database I am converting millisecond timestamp into UUID for each record. Which is overhead and can be removed.
- Can some one explain what are the benefits of using
timeuuid
instead ofbigint
considering records are able to identified without timeuuid's uniqueness ? - Is there any performance impact in between
timeuuid
andbigint
data type ?
答案1
得分: 2
生成时间戳的 timeuuid
不会对性能产生很大影响。如果在同一毫秒内可能会有许多事件发生,并且您需要排序,则 timeuuid
非常有用。使用 timeuuid
,您可以在同一毫秒内获得多达 10,000 个不同的值。典型的用例是具有以下结构的表:
create table tuuid (
pk int,
tuuid timeuuid,
....
....,
primary key (pk, tuuid));
在这种情况下,您将获得与 tuuid
的唯一性一起的排序(升序或降序)。当然,您可以使用主键 (pk, timestamp, random-value)
,但是使用 timeuuid
,您无需为唯一性添加额外的列。timeuuid
的一个缺点是与 Spark 集成时可能会遇到问题,因为 Spark 没有这种类型,可能无法执行筛选的推送操作。
如果不需要唯一性,那么可以切换到 timestamp
- 它在内部表示为 8 个字节长 - 与 bigint
相同,但您无需自行进行转换等操作。
英文:
There shouldn't be very big impact for performance if you generate timeuuid from timestamp. timeuuid
is useful if you may have many events happening in the same millisecond, and you need sorting - with timeuuid
you may get up to 10,000 different values inside the millisecond. Typical use case is the table with structure like this:
create table tuuid (
pk int,
tuuid timeuuid,
....
....,
primary key (pk, tuiid));
In this case, you will get sorting (ascending or descending) together with uniqueness of values for tuuid
. Of course you can come with primary key of (pk, timestamp, random-value)
, but with timeuuid
you don't need to have an additional column for uniqueness. One of the drawback of timeuuid
is integration with Spark, for example, as it doesn't have this type, and may not able to perform pushing of the filters.
If you don't need uniqueness, then just switch to timestamp
- it's represented as 8-bytes long internally - the same as bigint
, but you don't need to do conversions yourself, etc.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论