我能否从另一个ClickHouse中的物化视图创建一个物化视图?

huangapple go评论55阅读模式
英文:

Can I create a Materialized View from another Matrialized View in Clickhouse?

问题

以下是您要的部分翻译:

"The tile pretty much says it. I want to create a Materialized View whose "SELECT" clause SELECTs data from another Materialized View in Clickhouse. I have tried this. The SQL for "createion" fo the two views runs without an error. But upon runtime, the first view is populated, but the second one isn't.

I need to know if I am making a mistake in my SQL or this is just simply not possible.

Here's my two views:

CREATE MATERIALIZED VIEW IF NOT EXISTS production_gross
            ENGINE = ReplacingMergeTree
                ORDER BY (profile_type, reservoir, case_tag, variable_name, profile_phase, well_name, case_name,
                          timestamp) POPULATE
AS
SELECT profile_type,
       reservoir,
       case_tag,
       is_endorsed,
       toDateTime64(endorsement_date / 1000.0, 0) AS endorsement_date,
       endorsed_for_month,
       variable_name,
       profile_phase,
       well_name,
       case_name,
       asset_id,
       toDateTime64(eoh / 1000, 0)                as end_of_history,
       toDateTime64(ts / 1000, 0)                 as timestamp,
       value,                                     -- AS rate,  -- cubic meters per second rate for this month
       value * dateDiff('second',
                        toStartOfMonth(subtractMonths(now(), 1)),
                        toStartOfMonth(now()))    AS volume -- cubic meters volume for this month

FROM (
         SELECT pp.profile_type                                                                   AS profile_type,
                trimBoth(splitByChar('-', case_name)[1])                                          AS reservoir,
                JSONExtractString(cd.data, 'case_data', 'Tags$$Tag')                              AS case_tag,
                JSONExtractString(cd.data, 'case_data', 'Tags$$Endorsed')                         AS is_endorsed,
                -- Endorsement Data, is the timestamp when the user "endorsed" the case
                JSONExtract(cd.data, 'case_data', 'Tags$$EndorsementDate', 'time_stamp', 'Int64') AS endorsement_date,
                -- Endorsement Month is the month of year for which the case was actually endorsed
                JSONExtractString(cd.data, 'case_data', 'Tags$$MonthTags')                        AS endorsed_for_month,
                pp.variable_name                                                                  AS variable_name,
                JSONExtractString(pp.data, 'profile_phase')                                       AS profile_phase,
                JSONExtractString(wd.data, 'name')                                                AS well_name,
                JSONExtractString(cd.data, 'header', 'name')                                      AS case_name,
                -- We might want to have asset id here to use in roll-up
                JSONExtract(cd.data, 'header', 'reservoir_asset_id', 'Int64')                     AS asset_id, -- Asset Id in ARM
                JSONExtract(pp.data, 'end_of_history', 'Int64')                                   AS end_of_history,
                JSONExtract(pp.data, 'values', 'Array(Float64)')                                  AS values,
                JSONExtract(pp.data, 'timestamps', 'Array(Int64)')                                AS timestamps,
                JSONExtract(pp.data, 'end_of_history', 'Int64')                                   AS eoh
         FROM production_profile AS pp
                  INNER JOIN well_data AS wd ON wd.uuid = pp.well_id
                  INNER JOIN case_data AS cd ON cd.uuid = pp.case_id
         )
    ARRAY JOIN
     values AS value,
     timestamps AS ts
;

CREATE MATERIALIZED VIEW IF NOT EXISTS production_volume_actual
            ENGINE = ReplacingMergeTree
                ORDER BY (asset_id,
                          case_tag,
                          variable_name,
                          endorsement_date) POPULATE
AS
SELECT profile_type,
       case_tag,
       is_endorsed,
       endorsement_date,
       endorsed_for_month,
       variable_name,
       profile_phase,
       asset_id,
       sum(volume) AS total_actual_volume
FROM production_gross
WHERE timestamp < end_of_history
GROUP BY profile_type,
         case_tag,
         is_endorsed,
         endorsement_date,
         endorsed_for_month,
         variable_name,
         profile_phase,
         asset_id
ORDER BY asset_id ASC,
         case_tag ASC,
         variable_name ASC,
         endorsement_date ASC
;

As you can see, the second view is an "aggregation" on the first, and that is why I need it. If I want to do the aggregation from scratch, a lot of processes has to be done twice.

Update:
I have tried to change the query to the following:

SELECT ...
FROM `.inner.production_gross`
...

Which did not help. This query resulted in the following error:

Code: 60. DB::Exception: Table default.`.inner.production_gross` doesn't exist.

Then, based on the comment by @DennyCrane and using this answer: https://stackoverflow.com/a/67709334/959156, I run this query:

SELECT
    uuid,
    name
FROM system.tables
WHERE database = 'default' AND engine = 'MaterializedView'

Which gave me the uuid of the inner table:

ebab2dc5-2887-4e7d-998d-6acaff122fc7

So, I ran this query:

SELECT ...
FROM `.inner.ebab2dc5-2887-4e7d-998d-6acaff122fc7`

Which resulted in the following error:

Code: 60. DB::Exception: Table default.`.inner.ebab2dc5-2887-4e7d-998d-6acaff122fc7` doesn't exist.
英文:

The tile pretty much says it. I want to create a Materialized View whose "SELECT" clause SELECTs data from another Materialized View in Clickhouse. I have tried this. The SQL for "createion" fo the two views runs without an error. But upon runtime, the first view is populated, but the second one isn't.

I need to know if I am making a mistake in my SQL or this is just simply not possible.

Here's my two views:

CREATE MATERIALIZED VIEW IF NOT EXISTS production_gross
ENGINE = ReplacingMergeTree
ORDER BY (profile_type, reservoir, case_tag, variable_name, profile_phase, well_name, case_name,
timestamp) POPULATE
AS
SELECT profile_type,
reservoir,
case_tag,
is_endorsed,
toDateTime64(endorsement_date / 1000.0, 0) AS endorsement_date,
endorsed_for_month,
variable_name,
profile_phase,
well_name,
case_name,
asset_id,
toDateTime64(eoh / 1000, 0)                as end_of_history,
toDateTime64(ts / 1000, 0)                 as timestamp,
value,                                     -- AS rate,  -- cubic meters per second rate for this month
value * dateDiff('second',
toStartOfMonth(subtractMonths(now(), 1)),
toStartOfMonth(now()))    AS volume -- cubic meters volume for this month
FROM (
SELECT pp.profile_type                                                                   AS profile_type,
trimBoth(splitByChar('-', case_name)[1])                                          AS reservoir,
JSONExtractString(cd.data, 'case_data', 'Tags$$Tag')                              AS case_tag,
JSONExtractString(cd.data, 'case_data', 'Tags$$Endorsed')                         AS is_endorsed,
-- Endorsement Data, is the timestamp when the user "endorsed" the case
JSONExtract(cd.data, 'case_data', 'Tags$$EndorsementDate', 'time_stamp', 'Int64') AS endorsement_date,
-- Endorsement Month is the month of year for which the case was actually endorsed
JSONExtractString(cd.data, 'case_data', 'Tags$$MonthTags')                        AS endorsed_for_month,
pp.variable_name                                                                  AS variable_name,
JSONExtractString(pp.data, 'profile_phase')                                       AS profile_phase,
JSONExtractString(wd.data, 'name')                                                AS well_name,
JSONExtractString(cd.data, 'header', 'name')                                      AS case_name,
-- We might want to have asset id here to use in roll-up
JSONExtract(cd.data, 'header', 'reservoir_asset_id', 'Int64')                     AS asset_id, -- Asset Id in ARM
JSONExtract(pp.data, 'end_of_history', 'Int64')                                   AS end_of_history,
JSONExtract(pp.data, 'values', 'Array(Float64)')                                  AS values,
JSONExtract(pp.data, 'timestamps', 'Array(Int64)')                                AS timestamps,
JSONExtract(pp.data, 'end_of_history', 'Int64')                                   AS eoh
FROM production_profile AS pp
INNER JOIN well_data AS wd ON wd.uuid = pp.well_id
INNER JOIN case_data AS cd ON cd.uuid = pp.case_id
)
ARRAY JOIN
values AS value,
timestamps AS ts
;
CREATE MATERIALIZED VIEW IF NOT EXISTS production_volume_actual
ENGINE = ReplacingMergeTree
ORDER BY (asset_id,
case_tag,
variable_name,
endorsement_date) POPULATE
AS
SELECT profile_type,
case_tag,
is_endorsed,
endorsement_date,
endorsed_for_month,
variable_name,
profile_phase,
asset_id,
sum(volume) AS total_actual_volume
FROM production_gross
WHERE timestamp < end_of_history
GROUP BY profile_type,
case_tag,
is_endorsed,
endorsement_date,
endorsed_for_month,
variable_name,
profile_phase,
asset_id
ORDER BY asset_id ASC,
case_tag ASC,
variable_name ASC,
endorsement_date ASC
;

As you can see, the second view is an "aggregation" on the first, and that is why I need it. If I want to do the aggregation from scratch, a lot of processes has to be done twice.

Update:
I have tried to change the query to the following:

SELECT ...
FROM `.inner.production_gross`
...

Which did not help. This query resulted in the following error:

Code: 60. DB::Exception: Table default.`.inner.production_gross` doesn't exist.

Then, based on the comment by @DennyCrane and using this answer: https://stackoverflow.com/a/67709334/959156, I run this query:

SELECT
uuid,
name
FROM system.tables
WHERE database = 'default' AND engine = 'MaterializedView'

Which gave me the uuid of the inner table:

ebab2dc5-2887-4e7d-998d-6acaff122fc7

So, I ran this query:

SELECT ...
FROM `.inner.ebab2dc5-2887-4e7d-998d-6acaff122fc7`

Which resulted in the following error:

Code: 60. DB::Exception: Table default.`.inner.ebab2dc5-2887-4e7d-998d-6acaff122fc7` doesn't exist.

答案1

得分: 2

材质化视图在实际数据表上充当插入触发器,因此您的 production_volume_actual 表必须对数据表执行 SELECT 操作,而不是一个 "视图"。

如果您使用引擎(而不是 TO 另一个数据表)来创建一个材质化视图,在旧版本中 ClickHouse 实际上会创建一个名为 .inner.<mv_name> 的数据表(不使用 Atomic 数据库引擎),或者如果使用 Atomic 或 Replicated 数据库引擎,则会创建一个名为 .inner_id.<some UUID>. 的数据表。因此,如果您将第二个视图中的 SELECT 更改为这个 "inner" 表名,可以使用以下方式:

select from `.inner.production_gross`
select from `.inner_id.<UUID>`  -- 请注意 "inner" 上的额外 '_id'

它应该可以正常工作。

这个 答案 可以指导您找到正确的 UUID。

在 ClickHouse,我们实际上建议您始终将材质化视图创建为 TO <second_table>,以避免这种混淆,并使对 <second_table> 的操作更简单和更透明。

(感谢 OP Mostafa Zeinali 和 Denny Crane 对于最近的 ClickHouse 版本的澄清)

英文:

Materialized views work as insert triggers on actual data tables, so your production_volume_actual table has to do a SELECT on a data table, not a "view".

If you CREATE a materialized view using an ENGINE (and not as TO another data table), ClickHouse actually creates a data table with the name .inner.<mv_name> on older versions (not using an Atomic database engine), or .inner_id.<some UUID>. if using an Atomic or Replicated database engine. So if you change the select in your second view to this "inner" table name, either:

select from `.inner.production_gross`
select from `.inner_id.<UUID>`  -- note the extra '_id' on 'inner'

It should work.

This answer can point you to the right UUID.

At ClickHouse we actually recommend you always create Materialized Views as TO <second_table> to avoid this kind of confusion, and to make operations on <second_table> simpler and more transparent.

(Thanks to OP Mostafa Zeinali and Denny Crane for the clarification for more recent ClickHouse versions)

huangapple
  • 本文由 发表于 2023年1月9日 00:59:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/75049706.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定