统计基于分组的字符型值的范围内的已排序记录数。

huangapple go评论68阅读模式
英文:

Counting sequenced records in ranges, based on grouped varchar values

问题

I'm trying to calculate count of same values, in a certain pattern, where the pattern might occur again.

To simply, please see below data, and expected result.
Solution is implemented in SQL Server 2016 and above ...

The blocks are sequenced based on ts column.

The expected result should look like:

MAX_OF_PROP1_IN_BLOCK PROP1 PROP2 COUNT_PROP1_IN_BLOCK COUNT_OF_PROP2_IN_BLOCK
2023-05-01 10:51:00 AAAA 01 3 3
2023-05-01 10:48:00 BBBB 01 3 3
2023-05-01 10:45:00 AAAA 01 4 2
2023-05-01 10:45:00 AAAA 02 4 2
2023-05-01 10:41:00 CCCC 01 3 2
2023-05-01 10:41:00 CCCC 02 3 1
2023-05-01 10:38:00 BBBB 01 3 3
2023-05-01 10:35:00 AAAA 01 2 2
2023-05-01 10:33:00 CCCC 02 9 2
2023-05-01 10:33:00 CCCC 01 9 5
2023-05-01 10:33:00 CCCC 02 9 2

I've tried experimenting with window functions, but I'm keep getting always count/max values based on prop1/prop2 within whole scope, and not in ranges defined by prop1/prop2 values.
It is possible to preorganize data with CTE query before...

I believe the solution is quite simple, but after several tries, and searching the internet, I could not find an appropriate example. I believe, RANGE should be used, but I could not get this to work.

Any help/advice would be greatly appreciated.

英文:

I'm trying to calculate count of same values, in a certain pattern, where the pattern might occur again.

To simply, please see below data, and expected result.
Solution is implemented in SQL Server 2016 and above ...

DECLARE @dataTable TABLE
                   (
                       ts DATETIME, 
                       prop1 VARCHAR(4), 
                       prop2 VARCHAR(2)
                   );

INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:51:00', 'AAAA', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:50:00', 'AAAA', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:49:00', 'AAAA', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:48:00', 'BBBB', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:47:00', 'BBBB', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:46:00', 'BBBB', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:45:00', 'AAAA', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:44:00', 'AAAA', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:43:00', 'AAAA', '02');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:32:00', 'AAAA', '02');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:41:00', 'CCCC', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:40:00', 'CCCC', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:39:00', 'CCCC', '02');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:38:00', 'BBBB', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:37:00', 'BBBB', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:36:00', 'BBBB', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:35:00', 'AAAA', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:34:00', 'AAAA', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:33:00', 'CCCC', '02');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:32:00', 'CCCC', '02');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:31:00', 'CCCC', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:30:00', 'CCCC', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:29:00', 'CCCC', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:28:00', 'CCCC', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:27:00', 'CCCC', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:26:00', 'CCCC', '02');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:25:00', 'CCCC', '02');

SELECT 
    *, 
    MAX(ts) OVER (PARTITION BY prop1 ORDER BY ts DESC), 
    COUNT(ts) OVER (PARTITION BY prop1 ORDER BY ts DESC )
FROM 
    @dataTable 
ORDER BY 
    ts DESC;

The blocks are sequenced based on ts column.

The expected result should look like:

MAX_OF_PROP1_IN_BLOCK PROP1 PROP2 COUNT_PROP1_IN_BLOCK COUNT_OF_PROP2_IN_BLOCK
2023-05-01 10:51:00 AAAA 01 3 3
2023-05-01 10:48:00 BBBB 01 3 3
2023-05-01 10:45:00 AAAA 01 4 2
2023-05-01 10:45:00 AAAA 02 4 2
2023-05-01 10:41:00 CCCC 01 3 2
2023-05-01 10:41:00 CCCC 02 3 1
2023-05-01 10:38:00 BBBB 01 3 3
2023-05-01 10:35:00 AAAA 01 2 2
2023-05-01 10:33:00 CCCC 02 9 2
2023-05-01 10:33:00 CCCC 01 9 5
2023-05-01 10:33:00 CCCC 02 9 2

I've tried experimenting with window functions, but I'm keep getting always count/max values based on prop1/prop2 within whole scope, and not in ranges defined by prop1/prop2 values.
It is possible to preorganized data with CTE query before...

SELECT * 
, MAX(ts) OVER (PARTITION BY prop1 ORDER BY ts DESC)
, COUNT(ts) OVER (PARTITION BY prop1 ORDER BY ts DESC )
FROM @dataTable
ORDER BY ts DESC;

I believe the solution is quite simple, but after several tries, and searching the internet, I could not find appropriate example. I believe, RANGE should be used, but I could not get this to work.

Any help/advice would be greatly appreciated.

答案1

得分: 2

对于假设有两行包含 10:32:00 是一个打字错误的情况,您面临一个 gaps and islands 问题。

我更喜欢的方法是使用 lag 来在下一行的值不同时设置一个标志,然后将其求和以提供代表每个 island 的累积总数;有了这个设置后,您可以计算和汇总最终的数据。

请查看此演示示例

以下是代码部分,不需要翻译。

英文:

On the assumption that having two rows with 10:32:00 is a typo, you have a gaps and islands problem.

The method I prefer is to set a flag by using lag whenever the value in the next row is different, then sum this to provide a running total that represents each island; with this in place you can then count and aggregate the final data.

See this Demo Fiddle

with diff as (
  select *, 
    case when lag(prop1) over(order by ts desc ) = prop1 then 0 else 1 end gp1,
    case when lag(prop2) over(order by ts desc ) = prop2 then 0 else 1 end gp2
  from t
), grp as (
  select *, 
    Sum(gp1) over(order by ts desc) grp1, Sum(gp2) over(order by ts desc) grp2
  from diff
)
select Max_Of_Prop_In_Block, prop1, prop2, 
  grp1Count Count_Of_Prop1_In_Block, 
  grp2Count Count_Of_Prop2_In_Block
from (
  select *, 
	Max(ts) over(partition by grp1) Max_Of_Prop_In_Block, 
	Count(*) over(partition by grp1) grp1Count, 
	Count(*) over(partition by grp1, grp2) grp2Count
  from grp
)t
group by Max_Of_Prop_In_Block, grp1Count, prop1, prop2, grp2Count, grp2
order by Max_Of_Prop_In_Block desc, Max(grp2);

huangapple
  • 本文由 发表于 2023年6月6日 03:23:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/76409442.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定