ClickHouse是否按主键顺序存储ReplacingMergeTree表数据?

huangapple go评论54阅读模式
英文:

Does clickhouse stores ReplacingMergeTree tables data ordered by primary key?

问题

我想创建一个类似这样的表格

CREATE TABLE products ON CLUSTER cluster
(
    Id           UUID,
    Code         String,
    DownloadedAt DateTime,
    ...
    (约30列)
)
ENGINE = ReplicatedReplacingMergeTree()
ORDER BY (Id, Code)
PRIMARY KEY (Id, Code, DownloadedAt);

我希望数据根据项目ID和代码进行更新:如果我插入一个具有相同ID和代码的新产品,现有产品应该被更新,包括DownloadedAt列。

但我也希望产品按照DownloadedAt列进行排序,就像具有集群索引一样。
这里中我看到,ClickHouse在MergeTree情况下按主键排序存储数据,而ReplacingMergeTree与MergeTree不同之处在于它删除具有相同排序键值的重复条目(来自这里)。

我的表创建代码是否满足这两个条件?像这样创建表格可以吗?

英文:

I want to create a table like this

CREATE TABLE products ON CLUSTER cluster
(
    Id           UUID,
    Code         String,
    DownloadedAt DateTime,
    ...
    (~30 more columns)
)
ENGINE = ReplicatedReplacingMergeTree()
ORDER BY (Id, Code)
PRIMARY KEY (Id, Code, DownloadedAt);

I expect the data to be updated based on item ID and code: if I insert a new product with the same ID and Code the existing product should be updated including DownloadedAt column.

But I also want products to be ordered by DownloadedAt like with clustered index.
As I see here clickhouse stores data sorted by primary key in case of MergeTree, and ReplacingMergeTree differs from MergeTree in that it removes duplicate entries with the same sorting key value (from here).

Is my table creation code satisfies both this conditions? Is it OK to create table like this?

答案1

得分: 1

你不需要同时使用主键(PRIMARY KEY)和ORDER BY条目,它们本质上是相同的。

如果您将对所有这些列进行筛选,请使用以下ORDER BY语句:
ORDER BY(Id,Code,DownloadedAt);

但是,请注意,对于ReplacingMergeTree,您还可以定义一个版本列。因此,如果您的数据中使用了Id或Code列的版本控制,您可能还希望将最新的时间作为版本添加到ReplacingMergeTree的定义中:
ENGINE = ReplicatedReplacingMergeTree(DownloadedAt)

https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replacingmergetree

希望这有所帮助 - 请告诉我们。

英文:

You don't need both the PRIMARY KEY and the ORDER BY entries - they are both essentially the same thing.

If you will be filtering on all of those columns, then use this ORDER BY statement:
ORDER BY (Id, Code, DownloadedAt);

However, note for ReplacingMergeTree, you can also define a version column. So if you have versioning in use in your data for the Id or Code columns, you may want to add the most recent time as a version to the ReplacingMergeTree definition also:
ENGINE = ReplicatedReplacingMergeTree(DownloadedAt)

https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replacingmergetree

Hope that helps - please let us know

huangapple
  • 本文由 发表于 2023年7月14日 00:19:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76681482.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定