英文:
How can I do "double-aggregation" in T-SQL / SQL Server?
问题
I have a table which contains Issues and SubIssues in a customer service system. The data might look like this:
id | ticket_id | sub_ticket_id | response_time_in_mins | comment | cond1 | cond2 | cond3 |
---|---|---|---|---|---|---|---|
1 | 1000 | NULL | NULL | "the overall ticket about how quickly you closed down my account" | 1 | 0 | 1 |
2 | 1000 | 1 | 12 | "send ack email" | 1 | 0 | 1 |
3 | 1000 | 2 | 30 | "look up user in all DBs and remove" | 1 | 1 | 1 |
4 | 1000 | 3 | 5 | "send finished email" | 1 | 0 | 0 |
5 | 1001 | NULL | NULL | "the overall ticket about do you have my email in your sys?" | 1 | 0 | 1 |
6 | 1001 | 1 | 2 | "send ack email" | 0 | 0 | 1 |
7 | 1001 | 2 | 10 | "look up and notify" | 1 | 1 | 0 |
... |
I'd like to aggregate total response times for each ticket_id
satisfying each of the conditions at a time. For example the resulting table may look like this (but assume N conditions)
ticket_id | total_resp_time_cond_1 | total_resp_time_cond_2 | total_resp_time_cond_3 |
---|---|---|---|
1000 | 47 | 30 | 42 |
1001 | 10 | 10 | 2 |
... |
Here it is in pseudo code if I could do Python+SQL:
foreach ticket_number in issues_and_subissues:
foreach condition in cond1, cond2, cond3, ... condN:
SELECT sum(response_time) FROM issues_and_subissues isst WHERE isst.ticket_id=ticket_number AND condition=1
Is there a T-SQL-only way to achieve this via JOINs? or would I have to use CURSORS? any other options?
I searched on Stackoverflow and beyond and all I could find was the CURSOR way of doing it. Thank you!
英文:
I have a table which contains Issues and SubIssues in a customer service system. The data might look like this:
id | ticket_id | sub_ticket_id | response_time_in_mins | comment | cond1 | cond2 | cond3 |
---|---|---|---|---|---|---|---|
1 | 1000 | NULL | NULL | "the overall ticket about how quickly you closed down my account" | 1 | 0 | 1 |
2 | 1000 | 1 | 12 | "send ack email" | 1 | 0 | 1 |
3 | 1000 | 2 | 30 | "look up user in all DBs and remove" | 1 | 1 | 1 |
4 | 1000 | 3 | 5 | "send finished email" | 1 | 0 | 0 |
5 | 1001 | NULL | NULL | "the overall ticket about do you have my email in your sys?" | 1 | 0 | 1 |
6 | 1001 | 1 | 2 | "send ack email" | 0 | 0 | 1 |
7 | 1001 | 2 | 10 | "look up and notify" | 1 | 1 | 0 |
... |
I'd like to aggregate total response times for each ticket_id
satisfying each of the conditions at a time. For example the resulting table may look like this (but assume N conditions)
ticket_id | total_resp_time_cond_1 | total_resp_time_cond_2 | total_resp_time_cond_3 |
---|---|---|---|
1000 | 47 | 30 | 42 |
1001 | 10 | 10 | 2 |
... |
Here it is in pseudo code if I could do Python+SQL:
foreach ticket_number in issues_and_subissues:
foreach condition in cond1, cond2, cond3, ... condN:
SELECT sum(response_time) FROM issues_and_subissues isst WHERE isst.ticket_id=ticket_number AND condition=1
Is there a T-SQL-only way to achieve this via JOINs? or would I have to use CURSERS? any other options?
I searched on Stackoverflow and beyond and all I could find was the CURSER
way of doing it. Thank you!
答案1
得分: 3
We can sum the condition flag multiplied by the response time. This answer assumes that the condition flags would only ever be 0 or 1. If not, then this answer would have to change.
<!-- language: sql -->
SELECT
ticket_id,
SUM(cond1 * response_time_in_min) total_resp_time_cond_1,
SUM(cond2 * response_time_in_min) total_resp_time_cond_2,
SUM(cond3 * response_time_in_min) total_resp_time_cond_3
FROM yourTable
GROUP BY ticket_id
ORDER BY ticket_id;
英文:
We can sum the condition flag multiplied by the response time. This answer assumes that the condition flags would only ever be 0 or 1. If not, then this answer would have to change.
<!-- language: sql -->
SELECT
ticket_id,
SUM(cond1 * response_time_in_min) total_resp_time_cond_1,
SUM(cond2 * response_time_in_min) total_resp_time_cond_2,
SUM(cond3 * response_time_in_min) total_resp_time_cond_3
FROM yourTable
GROUP BY ticket_id
ORDER BY ticket_id;
答案2
得分: 2
这被称为条件聚合。您可以通过将CASE
表达式放在聚合函数内来执行:
SELECT ticket_id,
SUM(CASE WHEN cond1 = 1 THEN response_time_in_min ELSE 0 END) total_resp_time_cond_1,
SUM(CASE WHEN cond2 = 1 THEN response_time_in_min ELSE 0 END) total_resp_time_cond_2,
SUM(CASE WHEN cond3 = 1 THEN response_time_in_min ELSE 0 END) total_resp_time_cond_3
FROM [table]
GROUP BY ticket_id
英文:
It's called conditional aggregation. You do it by putting a CASE
expression inside an aggregate function:
SELECT ticket_id,
SUM(CASE WHEN cond1 = 1 THEN response_time_in_min ELSE 0 END) total_resp_time_cond_1,
SUM(CASE WHEN cond2 = 1 THEN response_time_in_min ELSE 0 END) total_resp_time_cond_2,
SUM(CASE WHEN cond3 = 1 THEN response_time_in_min ELSE 0 END) total_resp_time_cond_3
FROM [table]
GROUP BY ticket_id
答案3
得分: 0
如果我理解您的意思正确,您想要从无中生出“条件”。通常我是这样做的:
SELECT
ticket_id,
SUM(CASE WHEN cond = 1 THEN response_time_in_min END) total_resp_time_cond_1,
SUM(CASE WHEN cond = 2 THEN response_time_in_min END) total_resp_time_cond_2,
SUM(CASE WHEN cond = 3 THEN response_time_in_min END) total_resp_time_cond_3
FROM yourTable t
CROSS APPLY (
SELECT 1 AS cond
WHERE t.somefield = 1
AND t.anotherfield = 2
UNION ALL
SELECT 2 AS cond
WHERE t.somefield = 2
AND t.anotherfield = 3
SELECT 3 AS cond
WHERE t.somefield = 3
AND t.anotherfield = 4
) c
GROUP BY ticket_id
ORDER BY ticket_id;
或者选择另一种方式:
SELECT
ticket_id,
SUM(CASE WHEN t.somefield = 1
AND t.anotherfield = 2 THEN response_time_in_min END) total_resp_time_cond_1,
SUM(CASE WHEN t.somefield = 2
AND t.anotherfield = 3 THEN response_time_in_min END) total_resp_time_cond_2,
SUM(CASE WHEN t.somefield = 3
AND t.anotherfield = 4 THEN response_time_in_min END) total_resp_time_cond_3
FROM yourTable t
GROUP BY ticket_id
ORDER BY ticket_id;
这两种方法都有各自的优缺点,取决于您想要如何聚合数据。有时为每个条件创建自己的行是有用的,在这种情况下,CROSS APPLY 版本可能很不错。
英文:
If i understand you correctly, you want to create "conditions" out of thin air. This is usually how i do it:
SELECT
ticket_id,
SUM(CASE WHEN cond = 1 THEN response_time_in_min END) total_resp_time_cond_1,
SUM(CASE WHEN cond = 2 THEN response_time_in_min END) total_resp_time_cond_2,
SUM(CASE WHEN cond = 3 THEN response_time_in_min END) total_resp_time_cond_3
FROM yourTable t
CROSS APPLY (
SELECT 1 AS cond
WHERE t.somefield = 1
AND t.anotherfield = 2
UNION ALL
SELECT 2 AS cond
WHERE t.somefield = 2
AND t.anotherfield = 3
SELECT 3 AS cond
WHERE t.somefield = 3
AND t.anotherfield = 4
) c
GROUP BY ticket_id
ORDER BY ticket_id;
Alternative:
SELECT
ticket_id,
SUM(CASE WHEN t.somefield = 1
AND t.anotherfield = 2 THEN response_time_in_min END) total_resp_time_cond_1,
SUM(CASE WHEN t.somefield = 2
AND t.anotherfield = 3 THEN response_time_in_min END) total_resp_time_cond_2,
SUM(CASE WHEN t.somefield = 3
AND t.anotherfield = 4 THEN response_time_in_min END) total_resp_time_cond_3
FROM yourTable t
GROUP BY ticket_id
ORDER BY ticket_id;
Both ways have it's pros and cons, depending on how you wanna aggregate stuff. Sometimes it's useful to create own row for each condition, in this case CROSS APPLY version can be good.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论