2023年5月31日 23:20:08go评论75阅读模式

英文:

How can I produce an aggregated result with subgroups in Clickhouse?

问题

I want to be able to effectively produce an aggregated result from clickhouse table along with sort of an array with subgroups

So, let me give an example

Say, I have the following exemplary table

Column	Type
id	String
client	String
v1	Int
v2	Int
when	DateTime

Starting with simple aggregation query, like

SELECT id, AVG(v1) AVG1, SUM(v2) SUM2 FROM table WHERE When > today() GROUP BY id
that will produce something like

ID	AVG1	SUM2
1	100	300
2	200	400
...	...	...

I want extend the result with something like this

ID	AVG	SUM2	Rows per Client
1	100	300	[{AVG1:110, SUM2:150},{AVG1:90, SUM2:50},{AVG1:100, SUM2:100}]
2	200	400	[{200, 100},{200, 300},{200, 0}]
...	...	...	[...]

The rows per client field is aggregated with the same filters as the main query, but also applies extra group by to its results

I'm curious, if something like this is even possible in Clickhouse (and, if so, what's the most efficient way to do so), or do I have to use Join and then programmatically parse the results?

Joins are the best I've managed to accomplish so far, but the resulting query wasn't optimal, as I had to select the same data twice (note, that the table and queries I used are indeed exemplary, and the real ones has far more fields and more sophisticated aggregation), and the results are not quite the same to what I'm trying to accomplish

英文:

I want to be able to effectively produce an aggregated result from clickhouse table along with sort of an array with subgroups

So, let me give an example

Say, I have the following exemplary table

Column	Type
id	String
client	String
v1	Int
v2	Int
when	DateTime

Starting with simple aggregation query, like

SELECT id, AVG(v1) AVG1, SUM(v2) SUM2 FROM table WHERE When > today() GROUP BY id
that will produce something like

ID	AVG1	SUM2
1	100	300
2	200	400
...	...	...

I want extend the result with something like this

ID	AVG	SUM2	Rows per Client
1	100	300	[{AVG1:110, SUM2:150},{AVG1:90, SUM2:50},{AVG1:100, SUM2:100}]
2	200	400	[{200, 100},{200, 300},{200, 0}]
...	...	...	[...]

The rows per client field is aggregated with the same filters as the main query, but also applies extra group by to its results

I'm curios, if something like this is even possible in Clickhouse (and, if so, what's the most efficient way to do so), or do I have to use Join and then programmatically parse the results?

答案1

得分: 1

以下是翻译的内容：

创建表 I_AM_TIRED_TO_WRITE_EXAMPLES_WHY_ARE_YOU_SO_LAZY (
  id 	String, 
  client 	String, 
  v1 	Int, 
  v2 	Int, 
  When 	DateTime
) Engine=Memory;
插入到 I_AM_TIRED_TO_WRITE_EXAMPLES_WHY_ARE_YOU_SO_LAZY 
选择编号，arrayJoin(['client1', 'client2', 'client3']), 
     数字%10, 数字%3, today()
从 numbers(15);
选择
    id,
    avgMerge(AVG1s) 作为 AVG1,
    sum(SUM2s) 作为 SUM2,
    CAST(groupArray((client, (finalizeAggregation(AVG1s), SUM2s))), 'Map(String, Tuple(avg Float64, sum Int64))') 作为 r
从
(
    选择
        id,
        client,
        avgState(v1) 作为 AVG1s,
        SUM(v2) 作为 SUM2s
    从 I_AM_TIRED_TO_WRITE_EXAMPLES_WHY_ARE_YOU_SO_LAZY
    其中 When >= today()
    分组按
        id,
        client
)
分组按 id
按 id 升序排序

请注意，我已将 SQL 查询的代码部分翻译为中文，如您所要求。

英文:

create table I_AM_TIRED_TO_WRITE_EXAMPLES_WHY_ARE_YOU_SO_LAZY(
  id 	String,
  client 	String,
  v1 	Int,
  v2 	Int,
When 	DateTime) Engine=Memory;
insert into I_AM_TIRED_TO_WRITE_EXAMPLES_WHY_ARE_YOU_SO_LAZY
select number, arrayJoin([&#39;client1&#39;, &#39;client2&#39;, &#39;client3&#39;]),
     number%10, number%3, today()
from numbers(15);
SELECT
    id,
    avgMerge(AVG1s) AS AVG1,
    sum(SUM2s) AS SUM2,
    CAST(groupArray((client, (finalizeAggregation(AVG1s), SUM2s))), &#39;Map(String, Tuple(avg Float64, sum Int64))&#39;) AS r
FROM
(
    SELECT
        id,
        client,
        avgState(v1) AS AVG1s,
        SUM(v2) AS SUM2s
    FROM I_AM_TIRED_TO_WRITE_EXAMPLES_WHY_ARE_YOU_SO_LAZY
    WHERE When &gt;= today()
    GROUP BY
        id,
        client
)
GROUP BY id
ORDER BY id ASC
┌─id─┬─AVG1─┬─SUM2─┬─r─────────────────────────────────────────────────┐
│ 0  │    0 │    0 │ {&#39;client2&#39;:(0,0),&#39;client3&#39;:(0,0),&#39;client1&#39;:(0,0)} │
│ 1  │    1 │    3 │ {&#39;client1&#39;:(1,1),&#39;client2&#39;:(1,1),&#39;client3&#39;:(1,1)} │
│ 10 │    0 │    3 │ {&#39;client3&#39;:(0,1),&#39;client2&#39;:(0,1),&#39;client1&#39;:(0,1)} │
│ 11 │    1 │    6 │ {&#39;client3&#39;:(1,2),&#39;client2&#39;:(1,2),&#39;client1&#39;:(1,2)} │
│ 12 │    2 │    0 │ {&#39;client2&#39;:(2,0),&#39;client3&#39;:(2,0),&#39;client1&#39;:(2,0)} │
│ 13 │    3 │    3 │ {&#39;client2&#39;:(3,1),&#39;client3&#39;:(3,1),&#39;client1&#39;:(3,1)} │
│ 14 │    4 │    6 │ {&#39;client3&#39;:(4,2),&#39;client2&#39;:(4,2),&#39;client1&#39;:(4,2)} │
│ 2  │    2 │    6 │ {&#39;client1&#39;:(2,2),&#39;client3&#39;:(2,2),&#39;client2&#39;:(2,2)} │
│ 3  │    3 │    0 │ {&#39;client3&#39;:(3,0),&#39;client2&#39;:(3,0),&#39;client1&#39;:(3,0)} │
│ 4  │    4 │    3 │ {&#39;client1&#39;:(4,1),&#39;client3&#39;:(4,1),&#39;client2&#39;:(4,1)} │
│ 5  │    5 │    6 │ {&#39;client3&#39;:(5,2),&#39;client2&#39;:(5,2),&#39;client1&#39;:(5,2)} │
│ 6  │    6 │    0 │ {&#39;client2&#39;:(6,0),&#39;client3&#39;:(6,0),&#39;client1&#39;:(6,0)} │
│ 7  │    7 │    3 │ {&#39;client1&#39;:(7,1),&#39;client2&#39;:(7,1),&#39;client3&#39;:(7,1)} │
│ 8  │    8 │    6 │ {&#39;client3&#39;:(8,2),&#39;client2&#39;:(8,2),&#39;client1&#39;:(8,2)} │
│ 9  │    9 │    0 │ {&#39;client1&#39;:(9,0),&#39;client3&#39;:(9,0),&#39;client2&#39;:(9,0)} │
└────┴──────┴──────┴───────────────────────────────────────────────────┘

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在Clickhouse中生成带有子组的聚合结果？

问题

答案1

ClickHouse 和 PostgreSQL 中物化视图的区别是什么？

Snowflake获取完整的星期几名称

如何在Spark SQL中扩展内置的聚合函数（使用Scala）？

以Postgres中的分组方式显示逐行列值以及相同列的聚合值。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。