percentile_cont与聚合函数

huangapple go评论66阅读模式
英文:

percentile_cont with aggregate Function

问题

I'm using SQL Server 2012 and trying to calculate some aggregate functions and percentiles on a data set with two columns (id and time). In my research I found some solutions, but they don't work for me (maybe it's because of my outdated SQL version)

id time_seconds
1 120
2 10
3 50
4 80
5 60
6 42
7 96

I'll tried something like that:

SELECT 
    MIN(time_seconds) AS min,
    MAX(time_seconds) AS max,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY time_seconds) OVER () AS '95 percentil'
from table 

If i run this query, it wants me to make an include "time_seconds" in Group by, but then the aggregate function doesnt work anymore.

The Output id like to become is something like this :

|min |max |95 percentil|
|10 |120 |110|

英文:

I'm using SQL Server 2012 and trying to calculate some aggregate functions and percentiles on a data set with two columns (id and time). In my research I found some solutions, but they don't work for me (maybe it's because of my outdated SQL version)

id time_seconds
1 120
2 10
3 50
4 80
5 60
6 42
7 96

I'll tried something like that:

SELECT 
    MIN(time_seconds) AS min,
    MAX(time_seconds) AS max,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY time_seconds) OVER () AS '95 percentil'
from table 

If i run this query, it wants me to make an include "time_seconds" in Group by, but then the aggregate function doesnt work anymore.

The Output id like to become is something like this :

min max 95 percentil
10 120 110

Thanks for your help!

答案1

得分: 3

抱歉,以下是翻译好的部分:

不幸的是,`PERCENTILE_CONT` 仅作为窗口函数起作用,而不作为聚合函数。这是一种(罕见的)情况,`select distinct` 可能会有用:

select distinct
    min(time_seconds) over() as min_time_seconds,
    max(time_seconds) over() as max_time_seconds,
    percentile_cont(0.95) within group (order by time_seconds) over () as percentil_95
from mytable

基本上,这是将问题反过来解决的方式:由于我们无法进行聚合百分位数计算,因此我们将其他聚合函数转换为窗口函数:窗口计算在所有行上产生相同的值,因此我们可以使用 `distinct` 对结果集进行去重。

| min_time_seconds | max_time_seconds | percentile_95 |
| ----------------:|----------------:|------------:|
| 10 | 120 | 112.8 |

[fiddle](https://dbfiddle.uk/z_Yz7qLR) - 这是 SQL Server 2014,因为 db<>fiddle 不支持 2012。
英文:

Unfortunately PERCENTILE_CONT works as a window function only, not as an aggregate function. This is a (rare) case when select distinct may come useful:

select distinct
    min(time_seconds) over() as min_time_seconds,
    max(time_seconds) over() as max_time_seconds,
    percentile_cont(0.95) within group (order by time_seconds) over () as percentil_95
from mytable 

Basically this takes the problem the opposite way: since we can't have an aggregate percentile computation, we turn other aggregate functions to window functions: the window computations yield the same values on all rows, so we can then deduplicate the resultset with distinct.

min_time_seconds max_time_seconds percentil_95
10 120 112.8

fiddle - this is SQL Server 2014, since db<>fiddle does not support 2012.

答案2

得分: 0

;WITH cte AS (
    SELECT
        CAST(COUNT(*) OVER() AS float) AS total_rows,
        ROW_NUMBER() OVER (ORDER BY time_seconds) AS row_number,
        time_seconds
    FROM (
        SELECT 1 AS id, 120 AS time_seconds UNION ALL
        SELECT 2 AS id, 10 AS time_seconds UNION ALL
        SELECT 3 AS id, 50 AS time_seconds UNION ALL
        SELECT 4 AS id, 80 AS time_seconds UNION ALL
        SELECT 5 AS id, 60 AS time_seconds UNION ALL
        SELECT 6 AS id, 42 AS time_seconds UNION ALL
        SELECT 7 AS id, 96 AS time_seconds
    ) AS YourTable
),
percentiles AS (
    SELECT
        time_seconds,
        100.0 * ((row_number - 1) / (total_rows - 1)) AS percentile,
        LEAD(time_seconds) OVER (ORDER BY row_number) AS next_time_seconds,
        100.0 * (row_number / (total_rows - 1)) AS next_percentile
    FROM cte
)
SELECT 
    MIN(time_seconds) AS min_time_seconds, 
    MAX(time_seconds) AS max_time_seconds,
    (
        SELECT 
            time_seconds + (next_time_seconds - time_seconds) * (95 - percentile) / (next_percentile - percentile)
        FROM percentiles 
        WHERE percentile <= 95 AND next_percentile >= 95
    ) AS percentile_95
FROM cte;
英文:

Try this:

;WITH cte AS (
    SELECT
        CAST(COUNT(*) OVER() AS float) AS total_rows,
        ROW_NUMBER() OVER (ORDER BY time_seconds) AS row_number,
        time_seconds
    FROM (
        SELECT 1 AS id, 120 AS time_seconds UNION ALL
        SELECT 2 AS id, 10 AS time_seconds UNION ALL
        SELECT 3 AS id, 50 AS time_seconds UNION ALL
        SELECT 4 AS id, 80 AS time_seconds UNION ALL
        SELECT 5 AS id, 60 AS time_seconds UNION ALL
        SELECT 6 AS id, 42 AS time_seconds UNION ALL
        SELECT 7 AS id, 96 AS time_seconds
    ) AS YourTable
),
percentiles AS (
    SELECT
        time_seconds,
        100.0 * ((row_number - 1) / (total_rows - 1)) AS percentile,
        LEAD(time_seconds) OVER (ORDER BY row_number) AS next_time_seconds,
        100.0 * (row_number / (total_rows - 1)) AS next_percentile
    FROM cte
)
SELECT 
    MIN(time_seconds) AS min_time_seconds, 
    MAX(time_seconds) AS max_time_seconds,
    (
        SELECT 
            time_seconds + (next_time_seconds - time_seconds) * (95 - percentile) / (next_percentile - percentile)
        FROM percentiles 
        WHERE percentile &lt;= 95 AND next_percentile &gt;= 95
    ) AS percentile_95
FROM cte;

huangapple
  • 本文由 发表于 2023年6月6日 15:48:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/76412460.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定