如何在Clickhouse中生成带有子组的聚合结果?

huangapple go评论40阅读模式
英文:

How can I produce an aggregated result with subgroups in Clickhouse?

问题

I want to be able to effectively produce an aggregated result from clickhouse table along with sort of an array with subgroups

So, let me give an example

Say, I have the following exemplary table

Column Type
id String
client String
v1 Int
v2 Int
when DateTime

Starting with simple aggregation query, like

SELECT id, AVG(v1) AVG1, SUM(v2) SUM2 FROM table WHERE When > today() GROUP BY id
that will produce something like

ID AVG1 SUM2
1 100 300
2 200 400
... ... ...

I want extend the result with something like this

ID AVG SUM2 Rows per Client
1 100 300 [{AVG1:110, SUM2:150},{AVG1:90, SUM2:50},{AVG1:100, SUM2:100}]
2 200 400 [{200, 100},{200, 300},{200, 0}]
... ... ... [...]

The rows per client field is aggregated with the same filters as the main query, but also applies extra group by to its results

I'm curious, if something like this is even possible in Clickhouse (and, if so, what's the most efficient way to do so), or do I have to use Join and then programmatically parse the results?

Joins are the best I've managed to accomplish so far, but the resulting query wasn't optimal, as I had to select the same data twice (note, that the table and queries I used are indeed exemplary, and the real ones has far more fields and more sophisticated aggregation), and the results are not quite the same to what I'm trying to accomplish

英文:

I want to be able to effectively produce an aggregated result from clickhouse table along with sort of an array with subgroups

So, let me give an example

Say, I have the following exemplary table

Column Type
id String
client String
v1 Int
v2 Int
when DateTime

Starting with simple aggregation query, like

SELECT id, AVG(v1) AVG1, SUM(v2) SUM2 FROM table WHERE When > today() GROUP BY id
that will produce something like

ID AVG1 SUM2
1 100 300
2 200 400
... ... ...

I want extend the result with something like this

ID AVG SUM2 Rows per Client
1 100 300 [{AVG1:110, SUM2:150},{AVG1:90, SUM2:50},{AVG1:100, SUM2:100}]
2 200 400 [{200, 100},{200, 300},{200, 0}]
... ... ... [...]

The rows per client field is aggregated with the same filters as the main query, but also applies extra group by to its results

I'm curios, if something like this is even possible in Clickhouse (and, if so, what's the most efficient way to do so), or do I have to use Join and then programmatically parse the results?

Joins are the best I've managed to accomplish so far, but the resulting query wasn't optimal, as I had to select the same data twice (note, that the table and queries I used are indeed exemplary, and the real ones has far more fields and more sophisticated aggregation), and the results are not quite the same to what I'm trying to accomplish

答案1

得分: 1

以下是翻译的内容:

创建表 I_AM_TIRED_TO_WRITE_EXAMPLES_WHY_ARE_YOU_SO_LAZY (
  id 	String, 
  client 	String, 
  v1 	Int, 
  v2 	Int, 
  When 	DateTime
) Engine=Memory;

插入到 I_AM_TIRED_TO_WRITE_EXAMPLES_WHY_ARE_YOU_SO_LAZY 
选择编号,arrayJoin(['client1', 'client2', 'client3']), 
     数字%10, 数字%3, today()
 numbers(15);

选择
    id,
    avgMerge(AVG1s) 作为 AVG1,
    sum(SUM2s) 作为 SUM2,
    CAST(groupArray((client, (finalizeAggregation(AVG1s), SUM2s))), 'Map(String, Tuple(avg Float64, sum Int64))') 作为 r

(
    选择
        id,
        client,
        avgState(v1) 作为 AVG1s,
        SUM(v2) 作为 SUM2s
     I_AM_TIRED_TO_WRITE_EXAMPLES_WHY_ARE_YOU_SO_LAZY
    其中 When >= today()
    分组按
        id,
        client
)
分组按 id
 id 升序排序

请注意,我已将 SQL 查询的代码部分翻译为中文,如您所要求。

英文:
create table I_AM_TIRED_TO_WRITE_EXAMPLES_WHY_ARE_YOU_SO_LAZY(
  id 	String,
  client 	String,
  v1 	Int,
  v2 	Int,
When 	DateTime) Engine=Memory;

insert into I_AM_TIRED_TO_WRITE_EXAMPLES_WHY_ARE_YOU_SO_LAZY
select number, arrayJoin(['client1', 'client2', 'client3']),
     number%10, number%3, today()
from numbers(15);


SELECT
    id,
    avgMerge(AVG1s) AS AVG1,
    sum(SUM2s) AS SUM2,
    CAST(groupArray((client, (finalizeAggregation(AVG1s), SUM2s))), 'Map(String, Tuple(avg Float64, sum Int64))') AS r
FROM
(
    SELECT
        id,
        client,
        avgState(v1) AS AVG1s,
        SUM(v2) AS SUM2s
    FROM I_AM_TIRED_TO_WRITE_EXAMPLES_WHY_ARE_YOU_SO_LAZY
    WHERE When >= today()
    GROUP BY
        id,
        client
)
GROUP BY id
ORDER BY id ASC

┌─id─┬─AVG1─┬─SUM2─┬─r─────────────────────────────────────────────────┐
 0      0     0  {'client2':(0,0),'client3':(0,0),'client1':(0,0)} 
 1      1     3  {'client1':(1,1),'client2':(1,1),'client3':(1,1)} 
 10     0     3  {'client3':(0,1),'client2':(0,1),'client1':(0,1)} 
 11     1     6  {'client3':(1,2),'client2':(1,2),'client1':(1,2)} 
 12     2     0  {'client2':(2,0),'client3':(2,0),'client1':(2,0)} 
 13     3     3  {'client2':(3,1),'client3':(3,1),'client1':(3,1)} 
 14     4     6  {'client3':(4,2),'client2':(4,2),'client1':(4,2)} 
 2      2     6  {'client1':(2,2),'client3':(2,2),'client2':(2,2)} 
 3      3     0  {'client3':(3,0),'client2':(3,0),'client1':(3,0)} 
 4      4     3  {'client1':(4,1),'client3':(4,1),'client2':(4,1)} 
 5      5     6  {'client3':(5,2),'client2':(5,2),'client1':(5,2)} 
 6      6     0  {'client2':(6,0),'client3':(6,0),'client1':(6,0)} 
 7      7     3  {'client1':(7,1),'client2':(7,1),'client3':(7,1)} 
 8      8     6  {'client3':(8,2),'client2':(8,2),'client1':(8,2)} 
 9      9     0  {'client1':(9,0),'client3':(9,0),'client2':(9,0)} 
└────┴──────┴──────┴───────────────────────────────────────────────────┘

huangapple
  • 本文由 发表于 2023年5月31日 23:20:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/76375051.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定