英文:
percentile_cont with aggregate Function
问题
I'm using SQL Server 2012 and trying to calculate some aggregate functions and percentiles on a data set with two columns (id and time). In my research I found some solutions, but they don't work for me (maybe it's because of my outdated SQL version)
id | time_seconds |
---|---|
1 | 120 |
2 | 10 |
3 | 50 |
4 | 80 |
5 | 60 |
6 | 42 |
7 | 96 |
I'll tried something like that:
SELECT
MIN(time_seconds) AS min,
MAX(time_seconds) AS max,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY time_seconds) OVER () AS '95 percentil'
from table
If i run this query, it wants me to make an include "time_seconds" in Group by, but then the aggregate function doesnt work anymore.
The Output id like to become is something like this :
|min |max |95 percentil|
|10 |120 |110|
英文:
I'm using SQL Server 2012 and trying to calculate some aggregate functions and percentiles on a data set with two columns (id and time). In my research I found some solutions, but they don't work for me (maybe it's because of my outdated SQL version)
id | time_seconds |
---|---|
1 | 120 |
2 | 10 |
3 | 50 |
4 | 80 |
5 | 60 |
6 | 42 |
7 | 96 |
I'll tried something like that:
SELECT
MIN(time_seconds) AS min,
MAX(time_seconds) AS max,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY time_seconds) OVER () AS '95 percentil'
from table
If i run this query, it wants me to make an include "time_seconds" in Group by, but then the aggregate function doesnt work anymore.
The Output id like to become is something like this :
min | max | 95 percentil |
---|---|---|
10 | 120 | 110 |
Thanks for your help!
答案1
得分: 3
抱歉,以下是翻译好的部分:
不幸的是,`PERCENTILE_CONT` 仅作为窗口函数起作用,而不作为聚合函数。这是一种(罕见的)情况,`select distinct` 可能会有用:
select distinct
min(time_seconds) over() as min_time_seconds,
max(time_seconds) over() as max_time_seconds,
percentile_cont(0.95) within group (order by time_seconds) over () as percentil_95
from mytable
基本上,这是将问题反过来解决的方式:由于我们无法进行聚合百分位数计算,因此我们将其他聚合函数转换为窗口函数:窗口计算在所有行上产生相同的值,因此我们可以使用 `distinct` 对结果集进行去重。
| min_time_seconds | max_time_seconds | percentile_95 |
| ----------------:|----------------:|------------:|
| 10 | 120 | 112.8 |
[fiddle](https://dbfiddle.uk/z_Yz7qLR) - 这是 SQL Server 2014,因为 db<>fiddle 不支持 2012。
英文:
Unfortunately PERCENTILE_CONT
works as a window function only, not as an aggregate function. This is a (rare) case when select distinct
may come useful:
select distinct
min(time_seconds) over() as min_time_seconds,
max(time_seconds) over() as max_time_seconds,
percentile_cont(0.95) within group (order by time_seconds) over () as percentil_95
from mytable
Basically this takes the problem the opposite way: since we can't have an aggregate percentile computation, we turn other aggregate functions to window functions: the window computations yield the same values on all rows, so we can then deduplicate the resultset with distinct
.
min_time_seconds | max_time_seconds | percentil_95 |
---|---|---|
10 | 120 | 112.8 |
fiddle - this is SQL Server 2014, since db<>fiddle does not support 2012.
答案2
得分: 0
;WITH cte AS (
SELECT
CAST(COUNT(*) OVER() AS float) AS total_rows,
ROW_NUMBER() OVER (ORDER BY time_seconds) AS row_number,
time_seconds
FROM (
SELECT 1 AS id, 120 AS time_seconds UNION ALL
SELECT 2 AS id, 10 AS time_seconds UNION ALL
SELECT 3 AS id, 50 AS time_seconds UNION ALL
SELECT 4 AS id, 80 AS time_seconds UNION ALL
SELECT 5 AS id, 60 AS time_seconds UNION ALL
SELECT 6 AS id, 42 AS time_seconds UNION ALL
SELECT 7 AS id, 96 AS time_seconds
) AS YourTable
),
percentiles AS (
SELECT
time_seconds,
100.0 * ((row_number - 1) / (total_rows - 1)) AS percentile,
LEAD(time_seconds) OVER (ORDER BY row_number) AS next_time_seconds,
100.0 * (row_number / (total_rows - 1)) AS next_percentile
FROM cte
)
SELECT
MIN(time_seconds) AS min_time_seconds,
MAX(time_seconds) AS max_time_seconds,
(
SELECT
time_seconds + (next_time_seconds - time_seconds) * (95 - percentile) / (next_percentile - percentile)
FROM percentiles
WHERE percentile <= 95 AND next_percentile >= 95
) AS percentile_95
FROM cte;
英文:
Try this:
;WITH cte AS (
SELECT
CAST(COUNT(*) OVER() AS float) AS total_rows,
ROW_NUMBER() OVER (ORDER BY time_seconds) AS row_number,
time_seconds
FROM (
SELECT 1 AS id, 120 AS time_seconds UNION ALL
SELECT 2 AS id, 10 AS time_seconds UNION ALL
SELECT 3 AS id, 50 AS time_seconds UNION ALL
SELECT 4 AS id, 80 AS time_seconds UNION ALL
SELECT 5 AS id, 60 AS time_seconds UNION ALL
SELECT 6 AS id, 42 AS time_seconds UNION ALL
SELECT 7 AS id, 96 AS time_seconds
) AS YourTable
),
percentiles AS (
SELECT
time_seconds,
100.0 * ((row_number - 1) / (total_rows - 1)) AS percentile,
LEAD(time_seconds) OVER (ORDER BY row_number) AS next_time_seconds,
100.0 * (row_number / (total_rows - 1)) AS next_percentile
FROM cte
)
SELECT
MIN(time_seconds) AS min_time_seconds,
MAX(time_seconds) AS max_time_seconds,
(
SELECT
time_seconds + (next_time_seconds - time_seconds) * (95 - percentile) / (next_percentile - percentile)
FROM percentiles
WHERE percentile <= 95 AND next_percentile >= 95
) AS percentile_95
FROM cte;
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论