英文:
How can I produce an aggregated result with subgroups in Clickhouse?
问题
I want to be able to effectively produce an aggregated result from clickhouse table along with sort of an array with subgroups
So, let me give an example
Say, I have the following exemplary table
Column | Type |
---|---|
id | String |
client | String |
v1 | Int |
v2 | Int |
when | DateTime |
Starting with simple aggregation query, like
SELECT id, AVG(v1) AVG1, SUM(v2) SUM2 FROM table WHERE When > today() GROUP BY id
that will produce something like
ID | AVG1 | SUM2 |
---|---|---|
1 | 100 | 300 |
2 | 200 | 400 |
... | ... | ... |
I want extend the result with something like this
ID | AVG | SUM2 | Rows per Client |
---|---|---|---|
1 | 100 | 300 | [{AVG1:110, SUM2:150},{AVG1:90, SUM2:50},{AVG1:100, SUM2:100}] |
2 | 200 | 400 | [{200, 100},{200, 300},{200, 0}] |
... | ... | ... | [...] |
The rows per client
field is aggregated with the same filters as the main query, but also applies extra group by to its results
I'm curious, if something like this is even possible in Clickhouse (and, if so, what's the most efficient way to do so), or do I have to use Join and then programmatically parse the results?
Joins are the best I've managed to accomplish so far, but the resulting query wasn't optimal, as I had to select the same data twice (note, that the table and queries I used are indeed exemplary, and the real ones has far more fields and more sophisticated aggregation), and the results are not quite the same to what I'm trying to accomplish
英文:
I want to be able to effectively produce an aggregated result from clickhouse table along with sort of an array with subgroups
So, let me give an example
Say, I have the following exemplary table
Column | Type |
---|---|
id | String |
client | String |
v1 | Int |
v2 | Int |
when | DateTime |
Starting with simple aggregation query, like
SELECT id, AVG(v1) AVG1, SUM(v2) SUM2 FROM table WHERE When > today() GROUP BY id
that will produce something like
ID | AVG1 | SUM2 |
---|---|---|
1 | 100 | 300 |
2 | 200 | 400 |
... | ... | ... |
I want extend the result with something like this
ID | AVG | SUM2 | Rows per Client |
---|---|---|---|
1 | 100 | 300 | [{AVG1:110, SUM2:150},{AVG1:90, SUM2:50},{AVG1:100, SUM2:100}] |
2 | 200 | 400 | [{200, 100},{200, 300},{200, 0}] |
... | ... | ... | [...] |
The rows per client
field is aggregated with the same filters as the main query, but also applies extra group by to its results
I'm curios, if something like this is even possible in Clickhouse (and, if so, what's the most efficient way to do so), or do I have to use Join and then programmatically parse the results?
Joins are the best I've managed to accomplish so far, but the resulting query wasn't optimal, as I had to select the same data twice (note, that the table and queries I used are indeed exemplary, and the real ones has far more fields and more sophisticated aggregation), and the results are not quite the same to what I'm trying to accomplish
答案1
得分: 1
以下是翻译的内容:
创建表 I_AM_TIRED_TO_WRITE_EXAMPLES_WHY_ARE_YOU_SO_LAZY (
id String,
client String,
v1 Int,
v2 Int,
When DateTime
) Engine=Memory;
插入到 I_AM_TIRED_TO_WRITE_EXAMPLES_WHY_ARE_YOU_SO_LAZY
选择编号,arrayJoin(['client1', 'client2', 'client3']),
数字%10, 数字%3, today()
从 numbers(15);
选择
id,
avgMerge(AVG1s) 作为 AVG1,
sum(SUM2s) 作为 SUM2,
CAST(groupArray((client, (finalizeAggregation(AVG1s), SUM2s))), 'Map(String, Tuple(avg Float64, sum Int64))') 作为 r
从
(
选择
id,
client,
avgState(v1) 作为 AVG1s,
SUM(v2) 作为 SUM2s
从 I_AM_TIRED_TO_WRITE_EXAMPLES_WHY_ARE_YOU_SO_LAZY
其中 When >= today()
分组按
id,
client
)
分组按 id
按 id 升序排序
请注意,我已将 SQL 查询的代码部分翻译为中文,如您所要求。
英文:
create table I_AM_TIRED_TO_WRITE_EXAMPLES_WHY_ARE_YOU_SO_LAZY(
id String,
client String,
v1 Int,
v2 Int,
When DateTime) Engine=Memory;
insert into I_AM_TIRED_TO_WRITE_EXAMPLES_WHY_ARE_YOU_SO_LAZY
select number, arrayJoin(['client1', 'client2', 'client3']),
number%10, number%3, today()
from numbers(15);
SELECT
id,
avgMerge(AVG1s) AS AVG1,
sum(SUM2s) AS SUM2,
CAST(groupArray((client, (finalizeAggregation(AVG1s), SUM2s))), 'Map(String, Tuple(avg Float64, sum Int64))') AS r
FROM
(
SELECT
id,
client,
avgState(v1) AS AVG1s,
SUM(v2) AS SUM2s
FROM I_AM_TIRED_TO_WRITE_EXAMPLES_WHY_ARE_YOU_SO_LAZY
WHERE When >= today()
GROUP BY
id,
client
)
GROUP BY id
ORDER BY id ASC
┌─id─┬─AVG1─┬─SUM2─┬─r─────────────────────────────────────────────────┐
│ 0 │ 0 │ 0 │ {'client2':(0,0),'client3':(0,0),'client1':(0,0)} │
│ 1 │ 1 │ 3 │ {'client1':(1,1),'client2':(1,1),'client3':(1,1)} │
│ 10 │ 0 │ 3 │ {'client3':(0,1),'client2':(0,1),'client1':(0,1)} │
│ 11 │ 1 │ 6 │ {'client3':(1,2),'client2':(1,2),'client1':(1,2)} │
│ 12 │ 2 │ 0 │ {'client2':(2,0),'client3':(2,0),'client1':(2,0)} │
│ 13 │ 3 │ 3 │ {'client2':(3,1),'client3':(3,1),'client1':(3,1)} │
│ 14 │ 4 │ 6 │ {'client3':(4,2),'client2':(4,2),'client1':(4,2)} │
│ 2 │ 2 │ 6 │ {'client1':(2,2),'client3':(2,2),'client2':(2,2)} │
│ 3 │ 3 │ 0 │ {'client3':(3,0),'client2':(3,0),'client1':(3,0)} │
│ 4 │ 4 │ 3 │ {'client1':(4,1),'client3':(4,1),'client2':(4,1)} │
│ 5 │ 5 │ 6 │ {'client3':(5,2),'client2':(5,2),'client1':(5,2)} │
│ 6 │ 6 │ 0 │ {'client2':(6,0),'client3':(6,0),'client1':(6,0)} │
│ 7 │ 7 │ 3 │ {'client1':(7,1),'client2':(7,1),'client3':(7,1)} │
│ 8 │ 8 │ 6 │ {'client3':(8,2),'client2':(8,2),'client1':(8,2)} │
│ 9 │ 9 │ 0 │ {'client1':(9,0),'client3':(9,0),'client2':(9,0)} │
└────┴──────┴──────┴───────────────────────────────────────────────────┘
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论