英文:
Can I create a Materialized View from another Matrialized View in Clickhouse?
问题
以下是您要的部分翻译:
"The tile pretty much says it. I want to create a Materialized View whose "SELECT" clause SELECTs data from another Materialized View in Clickhouse. I have tried this. The SQL for "createion" fo the two views runs without an error. But upon runtime, the first view is populated, but the second one isn't.
I need to know if I am making a mistake in my SQL or this is just simply not possible.
Here's my two views:
CREATE MATERIALIZED VIEW IF NOT EXISTS production_gross
ENGINE = ReplacingMergeTree
ORDER BY (profile_type, reservoir, case_tag, variable_name, profile_phase, well_name, case_name,
timestamp) POPULATE
AS
SELECT profile_type,
reservoir,
case_tag,
is_endorsed,
toDateTime64(endorsement_date / 1000.0, 0) AS endorsement_date,
endorsed_for_month,
variable_name,
profile_phase,
well_name,
case_name,
asset_id,
toDateTime64(eoh / 1000, 0) as end_of_history,
toDateTime64(ts / 1000, 0) as timestamp,
value, -- AS rate, -- cubic meters per second rate for this month
value * dateDiff('second',
toStartOfMonth(subtractMonths(now(), 1)),
toStartOfMonth(now())) AS volume -- cubic meters volume for this month
FROM (
SELECT pp.profile_type AS profile_type,
trimBoth(splitByChar('-', case_name)[1]) AS reservoir,
JSONExtractString(cd.data, 'case_data', 'Tags$$Tag') AS case_tag,
JSONExtractString(cd.data, 'case_data', 'Tags$$Endorsed') AS is_endorsed,
-- Endorsement Data, is the timestamp when the user "endorsed" the case
JSONExtract(cd.data, 'case_data', 'Tags$$EndorsementDate', 'time_stamp', 'Int64') AS endorsement_date,
-- Endorsement Month is the month of year for which the case was actually endorsed
JSONExtractString(cd.data, 'case_data', 'Tags$$MonthTags') AS endorsed_for_month,
pp.variable_name AS variable_name,
JSONExtractString(pp.data, 'profile_phase') AS profile_phase,
JSONExtractString(wd.data, 'name') AS well_name,
JSONExtractString(cd.data, 'header', 'name') AS case_name,
-- We might want to have asset id here to use in roll-up
JSONExtract(cd.data, 'header', 'reservoir_asset_id', 'Int64') AS asset_id, -- Asset Id in ARM
JSONExtract(pp.data, 'end_of_history', 'Int64') AS end_of_history,
JSONExtract(pp.data, 'values', 'Array(Float64)') AS values,
JSONExtract(pp.data, 'timestamps', 'Array(Int64)') AS timestamps,
JSONExtract(pp.data, 'end_of_history', 'Int64') AS eoh
FROM production_profile AS pp
INNER JOIN well_data AS wd ON wd.uuid = pp.well_id
INNER JOIN case_data AS cd ON cd.uuid = pp.case_id
)
ARRAY JOIN
values AS value,
timestamps AS ts
;
CREATE MATERIALIZED VIEW IF NOT EXISTS production_volume_actual
ENGINE = ReplacingMergeTree
ORDER BY (asset_id,
case_tag,
variable_name,
endorsement_date) POPULATE
AS
SELECT profile_type,
case_tag,
is_endorsed,
endorsement_date,
endorsed_for_month,
variable_name,
profile_phase,
asset_id,
sum(volume) AS total_actual_volume
FROM production_gross
WHERE timestamp < end_of_history
GROUP BY profile_type,
case_tag,
is_endorsed,
endorsement_date,
endorsed_for_month,
variable_name,
profile_phase,
asset_id
ORDER BY asset_id ASC,
case_tag ASC,
variable_name ASC,
endorsement_date ASC
;
As you can see, the second view is an "aggregation" on the first, and that is why I need it. If I want to do the aggregation from scratch, a lot of processes has to be done twice.
Update:
I have tried to change the query to the following:
SELECT ...
FROM `.inner.production_gross`
...
Which did not help. This query resulted in the following error:
Code: 60. DB::Exception: Table default.`.inner.production_gross` doesn't exist.
Then, based on the comment by @DennyCrane and using this answer: https://stackoverflow.com/a/67709334/959156, I run this query:
SELECT
uuid,
name
FROM system.tables
WHERE database = 'default' AND engine = 'MaterializedView'
Which gave me the uuid of the inner table:
ebab2dc5-2887-4e7d-998d-6acaff122fc7
So, I ran this query:
SELECT ...
FROM `.inner.ebab2dc5-2887-4e7d-998d-6acaff122fc7`
Which resulted in the following error:
Code: 60. DB::Exception: Table default.`.inner.ebab2dc5-2887-4e7d-998d-6acaff122fc7` doesn't exist.
英文:
The tile pretty much says it. I want to create a Materialized View whose "SELECT" clause SELECTs data from another Materialized View in Clickhouse. I have tried this. The SQL for "createion" fo the two views runs without an error. But upon runtime, the first view is populated, but the second one isn't.
I need to know if I am making a mistake in my SQL or this is just simply not possible.
Here's my two views:
CREATE MATERIALIZED VIEW IF NOT EXISTS production_gross
ENGINE = ReplacingMergeTree
ORDER BY (profile_type, reservoir, case_tag, variable_name, profile_phase, well_name, case_name,
timestamp) POPULATE
AS
SELECT profile_type,
reservoir,
case_tag,
is_endorsed,
toDateTime64(endorsement_date / 1000.0, 0) AS endorsement_date,
endorsed_for_month,
variable_name,
profile_phase,
well_name,
case_name,
asset_id,
toDateTime64(eoh / 1000, 0) as end_of_history,
toDateTime64(ts / 1000, 0) as timestamp,
value, -- AS rate, -- cubic meters per second rate for this month
value * dateDiff('second',
toStartOfMonth(subtractMonths(now(), 1)),
toStartOfMonth(now())) AS volume -- cubic meters volume for this month
FROM (
SELECT pp.profile_type AS profile_type,
trimBoth(splitByChar('-', case_name)[1]) AS reservoir,
JSONExtractString(cd.data, 'case_data', 'Tags$$Tag') AS case_tag,
JSONExtractString(cd.data, 'case_data', 'Tags$$Endorsed') AS is_endorsed,
-- Endorsement Data, is the timestamp when the user "endorsed" the case
JSONExtract(cd.data, 'case_data', 'Tags$$EndorsementDate', 'time_stamp', 'Int64') AS endorsement_date,
-- Endorsement Month is the month of year for which the case was actually endorsed
JSONExtractString(cd.data, 'case_data', 'Tags$$MonthTags') AS endorsed_for_month,
pp.variable_name AS variable_name,
JSONExtractString(pp.data, 'profile_phase') AS profile_phase,
JSONExtractString(wd.data, 'name') AS well_name,
JSONExtractString(cd.data, 'header', 'name') AS case_name,
-- We might want to have asset id here to use in roll-up
JSONExtract(cd.data, 'header', 'reservoir_asset_id', 'Int64') AS asset_id, -- Asset Id in ARM
JSONExtract(pp.data, 'end_of_history', 'Int64') AS end_of_history,
JSONExtract(pp.data, 'values', 'Array(Float64)') AS values,
JSONExtract(pp.data, 'timestamps', 'Array(Int64)') AS timestamps,
JSONExtract(pp.data, 'end_of_history', 'Int64') AS eoh
FROM production_profile AS pp
INNER JOIN well_data AS wd ON wd.uuid = pp.well_id
INNER JOIN case_data AS cd ON cd.uuid = pp.case_id
)
ARRAY JOIN
values AS value,
timestamps AS ts
;
CREATE MATERIALIZED VIEW IF NOT EXISTS production_volume_actual
ENGINE = ReplacingMergeTree
ORDER BY (asset_id,
case_tag,
variable_name,
endorsement_date) POPULATE
AS
SELECT profile_type,
case_tag,
is_endorsed,
endorsement_date,
endorsed_for_month,
variable_name,
profile_phase,
asset_id,
sum(volume) AS total_actual_volume
FROM production_gross
WHERE timestamp < end_of_history
GROUP BY profile_type,
case_tag,
is_endorsed,
endorsement_date,
endorsed_for_month,
variable_name,
profile_phase,
asset_id
ORDER BY asset_id ASC,
case_tag ASC,
variable_name ASC,
endorsement_date ASC
;
As you can see, the second view is an "aggregation" on the first, and that is why I need it. If I want to do the aggregation from scratch, a lot of processes has to be done twice.
Update:
I have tried to change the query to the following:
SELECT ...
FROM `.inner.production_gross`
...
Which did not help. This query resulted in the following error:
Code: 60. DB::Exception: Table default.`.inner.production_gross` doesn't exist.
Then, based on the comment by @DennyCrane and using this answer: https://stackoverflow.com/a/67709334/959156, I run this query:
SELECT
uuid,
name
FROM system.tables
WHERE database = 'default' AND engine = 'MaterializedView'
Which gave me the uuid of the inner table:
ebab2dc5-2887-4e7d-998d-6acaff122fc7
So, I ran this query:
SELECT ...
FROM `.inner.ebab2dc5-2887-4e7d-998d-6acaff122fc7`
Which resulted in the following error:
Code: 60. DB::Exception: Table default.`.inner.ebab2dc5-2887-4e7d-998d-6acaff122fc7` doesn't exist.
答案1
得分: 2
材质化视图在实际数据表上充当插入触发器,因此您的 production_volume_actual
表必须对数据表执行 SELECT 操作,而不是一个 "视图"。
如果您使用引擎(而不是 TO
另一个数据表)来创建一个材质化视图,在旧版本中 ClickHouse 实际上会创建一个名为 .inner.<mv_name>
的数据表(不使用 Atomic 数据库引擎),或者如果使用 Atomic 或 Replicated 数据库引擎,则会创建一个名为 .inner_id.<some UUID>.
的数据表。因此,如果您将第二个视图中的 SELECT 更改为这个 "inner" 表名,可以使用以下方式:
select from `.inner.production_gross`
select from `.inner_id.<UUID>` -- 请注意 "inner" 上的额外 '_id'
它应该可以正常工作。
这个 答案 可以指导您找到正确的 UUID。
在 ClickHouse,我们实际上建议您始终将材质化视图创建为 TO <second_table>
,以避免这种混淆,并使对 <second_table>
的操作更简单和更透明。
(感谢 OP Mostafa Zeinali 和 Denny Crane 对于最近的 ClickHouse 版本的澄清)
英文:
Materialized views work as insert triggers on actual data tables, so your production_volume_actual
table has to do a SELECT on a data table, not a "view".
If you CREATE
a materialized view using an ENGINE (and not as TO
another data table), ClickHouse actually creates a data table with the name .inner.<mv_name>
on older versions (not using an Atomic database engine), or .inner_id.<some UUID>.
if using an Atomic or Replicated database engine. So if you change the select in your second view to this "inner" table name, either:
select from `.inner.production_gross`
select from `.inner_id.<UUID>` -- note the extra '_id' on 'inner'
It should work.
This answer can point you to the right UUID.
At ClickHouse we actually recommend you always create Materialized Views as TO <second_table>
to avoid this kind of confusion, and to make operations on <second_table>
simpler and more transparent.
(Thanks to OP Mostafa Zeinali and Denny Crane for the clarification for more recent ClickHouse versions)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论