2023年1月9日 00:59:46go评论55阅读模式

英文:

Can I create a Materialized View from another Matrialized View in Clickhouse?

问题

以下是您要的部分翻译：

"The tile pretty much says it. I want to create a Materialized View whose "SELECT" clause SELECTs data from another Materialized View in Clickhouse. I have tried this. The SQL for "createion" fo the two views runs without an error. But upon runtime, the first view is populated, but the second one isn't.

I need to know if I am making a mistake in my SQL or this is just simply not possible.

Here's my two views:

CREATE MATERIALIZED VIEW IF NOT EXISTS production_gross
            ENGINE = ReplacingMergeTree
                ORDER BY (profile_type, reservoir, case_tag, variable_name, profile_phase, well_name, case_name,
                          timestamp) POPULATE
AS
SELECT profile_type,
       reservoir,
       case_tag,
       is_endorsed,
       toDateTime64(endorsement_date / 1000.0, 0) AS endorsement_date,
       endorsed_for_month,
       variable_name,
       profile_phase,
       well_name,
       case_name,
       asset_id,
       toDateTime64(eoh / 1000, 0)                as end_of_history,
       toDateTime64(ts / 1000, 0)                 as timestamp,
       value,                                     -- AS rate,  -- cubic meters per second rate for this month
       value * dateDiff(&#39;second&#39;,
                        toStartOfMonth(subtractMonths(now(), 1)),
                        toStartOfMonth(now()))    AS volume -- cubic meters volume for this month

FROM (
         SELECT pp.profile_type                                                                   AS profile_type,
                trimBoth(splitByChar(&#39;-&#39;, case_name)[1])                                          AS reservoir,
                JSONExtractString(cd.data, &#39;case_data&#39;, &#39;Tags$$Tag&#39;)                              AS case_tag,
                JSONExtractString(cd.data, &#39;case_data&#39;, &#39;Tags$$Endorsed&#39;)                         AS is_endorsed,
                -- Endorsement Data, is the timestamp when the user &quot;endorsed&quot; the case
                JSONExtract(cd.data, &#39;case_data&#39;, &#39;Tags$$EndorsementDate&#39;, &#39;time_stamp&#39;, &#39;Int64&#39;) AS endorsement_date,
                -- Endorsement Month is the month of year for which the case was actually endorsed
                JSONExtractString(cd.data, &#39;case_data&#39;, &#39;Tags$$MonthTags&#39;)                        AS endorsed_for_month,
                pp.variable_name                                                                  AS variable_name,
                JSONExtractString(pp.data, &#39;profile_phase&#39;)                                       AS profile_phase,
                JSONExtractString(wd.data, &#39;name&#39;)                                                AS well_name,
                JSONExtractString(cd.data, &#39;header&#39;, &#39;name&#39;)                                      AS case_name,
                -- We might want to have asset id here to use in roll-up
                JSONExtract(cd.data, &#39;header&#39;, &#39;reservoir_asset_id&#39;, &#39;Int64&#39;)                     AS asset_id, -- Asset Id in ARM
                JSONExtract(pp.data, &#39;end_of_history&#39;, &#39;Int64&#39;)                                   AS end_of_history,
                JSONExtract(pp.data, &#39;values&#39;, &#39;Array(Float64)&#39;)                                  AS values,
                JSONExtract(pp.data, &#39;timestamps&#39;, &#39;Array(Int64)&#39;)                                AS timestamps,
                JSONExtract(pp.data, &#39;end_of_history&#39;, &#39;Int64&#39;)                                   AS eoh
         FROM production_profile AS pp
                  INNER JOIN well_data AS wd ON wd.uuid = pp.well_id
                  INNER JOIN case_data AS cd ON cd.uuid = pp.case_id
         )
    ARRAY JOIN
     values AS value,
     timestamps AS ts
;

CREATE MATERIALIZED VIEW IF NOT EXISTS production_volume_actual
            ENGINE = ReplacingMergeTree
                ORDER BY (asset_id,
                          case_tag,
                          variable_name,
                          endorsement_date) POPULATE
AS
SELECT profile_type,
       case_tag,
       is_endorsed,
       endorsement_date,
       endorsed_for_month,
       variable_name,
       profile_phase,
       asset_id,
       sum(volume) AS total_actual_volume
FROM production_gross
WHERE timestamp &lt; end_of_history
GROUP BY profile_type,
         case_tag,
         is_endorsed,
         endorsement_date,
         endorsed_for_month,
         variable_name,
         profile_phase,
         asset_id
ORDER BY asset_id ASC,
         case_tag ASC,
         variable_name ASC,
         endorsement_date ASC
;

As you can see, the second view is an "aggregation" on the first, and that is why I need it. If I want to do the aggregation from scratch, a lot of processes has to be done twice.

Update:
I have tried to change the query to the following:

SELECT ...
FROM `.inner.production_gross`
...

Which did not help. This query resulted in the following error:

Code: 60. DB::Exception: Table default.`.inner.production_gross` doesn&#39;t exist.

Then, based on the comment by @DennyCrane and using this answer: https://stackoverflow.com/a/67709334/959156, I run this query:

SELECT
    uuid,
    name
FROM system.tables
WHERE database = &#39;default&#39; AND engine = &#39;MaterializedView&#39;

Which gave me the uuid of the inner table:

ebab2dc5-2887-4e7d-998d-6acaff122fc7

So, I ran this query:

SELECT ...
FROM `.inner.ebab2dc5-2887-4e7d-998d-6acaff122fc7`

Which resulted in the following error:

Code: 60. DB::Exception: Table default.`.inner.ebab2dc5-2887-4e7d-998d-6acaff122fc7` doesn&#39;t exist.

英文:

The tile pretty much says it. I want to create a Materialized View whose "SELECT" clause SELECTs data from another Materialized View in Clickhouse. I have tried this. The SQL for "createion" fo the two views runs without an error. But upon runtime, the first view is populated, but the second one isn't.

I need to know if I am making a mistake in my SQL or this is just simply not possible.

Here's my two views:

CREATE MATERIALIZED VIEW IF NOT EXISTS production_gross
ENGINE = ReplacingMergeTree
ORDER BY (profile_type, reservoir, case_tag, variable_name, profile_phase, well_name, case_name,
timestamp) POPULATE
AS
SELECT profile_type,
reservoir,
case_tag,
is_endorsed,
toDateTime64(endorsement_date / 1000.0, 0) AS endorsement_date,
endorsed_for_month,
variable_name,
profile_phase,
well_name,
case_name,
asset_id,
toDateTime64(eoh / 1000, 0)                as end_of_history,
toDateTime64(ts / 1000, 0)                 as timestamp,
value,                                     -- AS rate,  -- cubic meters per second rate for this month
value * dateDiff(&#39;second&#39;,
toStartOfMonth(subtractMonths(now(), 1)),
toStartOfMonth(now()))    AS volume -- cubic meters volume for this month
FROM (
SELECT pp.profile_type                                                                   AS profile_type,
trimBoth(splitByChar(&#39;-&#39;, case_name)[1])                                          AS reservoir,
JSONExtractString(cd.data, &#39;case_data&#39;, &#39;Tags$$Tag&#39;)                              AS case_tag,
JSONExtractString(cd.data, &#39;case_data&#39;, &#39;Tags$$Endorsed&#39;)                         AS is_endorsed,
-- Endorsement Data, is the timestamp when the user &quot;endorsed&quot; the case
JSONExtract(cd.data, &#39;case_data&#39;, &#39;Tags$$EndorsementDate&#39;, &#39;time_stamp&#39;, &#39;Int64&#39;) AS endorsement_date,
-- Endorsement Month is the month of year for which the case was actually endorsed
JSONExtractString(cd.data, &#39;case_data&#39;, &#39;Tags$$MonthTags&#39;)                        AS endorsed_for_month,
pp.variable_name                                                                  AS variable_name,
JSONExtractString(pp.data, &#39;profile_phase&#39;)                                       AS profile_phase,
JSONExtractString(wd.data, &#39;name&#39;)                                                AS well_name,
JSONExtractString(cd.data, &#39;header&#39;, &#39;name&#39;)                                      AS case_name,
-- We might want to have asset id here to use in roll-up
JSONExtract(cd.data, &#39;header&#39;, &#39;reservoir_asset_id&#39;, &#39;Int64&#39;)                     AS asset_id, -- Asset Id in ARM
JSONExtract(pp.data, &#39;end_of_history&#39;, &#39;Int64&#39;)                                   AS end_of_history,
JSONExtract(pp.data, &#39;values&#39;, &#39;Array(Float64)&#39;)                                  AS values,
JSONExtract(pp.data, &#39;timestamps&#39;, &#39;Array(Int64)&#39;)                                AS timestamps,
JSONExtract(pp.data, &#39;end_of_history&#39;, &#39;Int64&#39;)                                   AS eoh
FROM production_profile AS pp
INNER JOIN well_data AS wd ON wd.uuid = pp.well_id
INNER JOIN case_data AS cd ON cd.uuid = pp.case_id
)
ARRAY JOIN
values AS value,
timestamps AS ts
;
CREATE MATERIALIZED VIEW IF NOT EXISTS production_volume_actual
ENGINE = ReplacingMergeTree
ORDER BY (asset_id,
case_tag,
variable_name,
endorsement_date) POPULATE
AS
SELECT profile_type,
case_tag,
is_endorsed,
endorsement_date,
endorsed_for_month,
variable_name,
profile_phase,
asset_id,
sum(volume) AS total_actual_volume
FROM production_gross
WHERE timestamp &lt; end_of_history
GROUP BY profile_type,
case_tag,
is_endorsed,
endorsement_date,
endorsed_for_month,
variable_name,
profile_phase,
asset_id
ORDER BY asset_id ASC,
case_tag ASC,
variable_name ASC,
endorsement_date ASC
;

As you can see, the second view is an "aggregation" on the first, and that is why I need it. If I want to do the aggregation from scratch, a lot of processes has to be done twice.

Update:
I have tried to change the query to the following:

SELECT ...
FROM `.inner.production_gross`
...

Which did not help. This query resulted in the following error:

Code: 60. DB::Exception: Table default.`.inner.production_gross` doesn&#39;t exist.

Then, based on the comment by @DennyCrane and using this answer: https://stackoverflow.com/a/67709334/959156, I run this query:

SELECT
uuid,
name
FROM system.tables
WHERE database = &#39;default&#39; AND engine = &#39;MaterializedView&#39;

Which gave me the uuid of the inner table:

ebab2dc5-2887-4e7d-998d-6acaff122fc7

So, I ran this query:

SELECT ...
FROM `.inner.ebab2dc5-2887-4e7d-998d-6acaff122fc7`

Which resulted in the following error:

Code: 60. DB::Exception: Table default.`.inner.ebab2dc5-2887-4e7d-998d-6acaff122fc7` doesn&#39;t exist.

答案1

得分: 2

材质化视图在实际数据表上充当插入触发器，因此您的 production_volume_actual 表必须对数据表执行 SELECT 操作，而不是一个 "视图"。

如果您使用引擎（而不是 TO 另一个数据表）来创建一个材质化视图，在旧版本中 ClickHouse 实际上会创建一个名为 .inner.<mv_name> 的数据表（不使用 Atomic 数据库引擎），或者如果使用 Atomic 或 Replicated 数据库引擎，则会创建一个名为 .inner_id.<some UUID>. 的数据表。因此，如果您将第二个视图中的 SELECT 更改为这个 "inner" 表名，可以使用以下方式：

select from `.inner.production_gross`

select from `.inner_id.&lt;UUID&gt;`  -- 请注意 "inner" 上的额外 '_id'

它应该可以正常工作。

这个答案可以指导您找到正确的 UUID。

在 ClickHouse，我们实际上建议您始终将材质化视图创建为 TO <second_table>，以避免这种混淆，并使对 <second_table> 的操作更简单和更透明。

（感谢 OP Mostafa Zeinali 和 Denny Crane 对于最近的 ClickHouse 版本的澄清）

英文:

Materialized views work as insert triggers on actual data tables, so your production_volume_actual table has to do a SELECT on a data table, not a "view".

If you CREATE a materialized view using an ENGINE (and not as TO another data table), ClickHouse actually creates a data table with the name .inner.<mv_name> on older versions (not using an Atomic database engine), or .inner_id.<some UUID>. if using an Atomic or Replicated database engine. So if you change the select in your second view to this "inner" table name, either:

select from `.inner.production_gross`

select from `.inner_id.&lt;UUID&gt;`  -- note the extra &#39;_id&#39; on &#39;inner&#39;

It should work.

This answer can point you to the right UUID.

At ClickHouse we actually recommend you always create Materialized Views as TO <second_table> to avoid this kind of confusion, and to make operations on <second_table> simpler and more transparent.

(Thanks to OP Mostafa Zeinali and Denny Crane for the clarification for more recent ClickHouse versions)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

我能否从另一个ClickHouse中的物化视图创建一个物化视图？

问题

答案1

如何使用一个命令在ClickHouse中使用Golang插入CSV文件

代码：36. DB::Exception: 位置选项不受支持。（BAD_ARGUMENTS）

SQL ClickHouse如何根据不同列中的值使用邻居函数

分区在ClickHouse中的实际用途是什么？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论