如何在SQL/Snowflake中从时间序列数据中获取最小值和最大值?

huangapple go评论86阅读模式
英文:

How to get min and max from a timeseries data in SQL/Snowflake?

问题

Here's the translated code portion:

我的数据如下,时间戳按升序排列。

INSERT INTO timeseries (timestamp, value)
VALUES
('2022-01-01 00:00:00', 0.89),
('2022-01-01 10:01:00', 6.89),
('2022-01-02 10:01:21', 10.99),
('2022-01-02 10:07:00', 11.89),
('2022-01-02 12:01:00', 0.89),
('2022-01-02 13:07:00', 6.39),
('2022-01-02 14:00:00', 0.69),
('2022-01-03 14:02:00', 5.39),
('2022-01-03 15:04:00', 6.89),
('2022-01-03 15:00:00', 7.3),
('2022-01-03 15:10:00', 1.89),
('2022-01-03 15:50:00', 0.8);


我要求获取最小值和最大值,以便在值超过5时生成分钟差异。在上述数据中,形成了三个数据集。

最小值 最大值
2022-01-01 10:01:00 2022-01-02 10:07:00
2022-01-02 13:07:00 2022-01-02 13:07:00
2022-01-03 14:02:00 2022-01-03 15:00:00


我尝试使用`row_number`来获取顺序,但由于有三个集合,我需要添加列以在分区子句中使用,但我对此没有任何想法?

WITH CTE AS (
SELECT CASE WHEN VALUE > 5 THEN 'ON' ELSE 'OFF' END STATUS, TIMESTAMP, VALUE
FROM TIMESERIES)
SELECT ROW_NUMBER() OVER(PARTITION BY STATUS ORDER BY TIMESTAMP) RN, TIMESTAMP, VALUE FROM CTE
ORDER BY TIMESTAMP;


这为我提供了所有大于5的行号,但我需要基于数据集来获取行号。

希望这可以帮助你开始或找到解决方案。顺便说一下,在Snowflake尝试此操作,通用SQL也适用。

<details>
<summary>英文:</summary>

my data is something like this, timestamp is ordered in asc.

INSERT INTO timeseries (timestamp, value)
VALUES
('2022-01-01 00:00:00', 0.89),
('2022-01-01 10:01:00', 6.89),
('2022-01-02 10:01:21', 10.99),
('2022-01-02 10:07:00', 11.89),

('2022-01-02 12:01:00', 0.89),
('2022-01-02 13:07:00', 6.39),
('2022-01-02 14:00:00', 0.69),
('2022-01-03 14:02:00', 5.39),
('2022-01-03 15:04:00', 6.89),
('2022-01-03 15:00:00', 7.3),

('2022-01-03 15:10:00', 1.89),
('2022-01-03 15:50:00', 0.8);


my ask is to get min and max, so i can produce difference in minutes whenever value is going above 5.
in above, there are three data sets formed

min max
2022-01-01 10:01:00 2022-01-02 10:07:00
2022-01-02 13:07:00 2022-01-02 13:07:00
2022-01-03 14:02:00 2022-01-03 15:00:00

I tried row_number to get sequence but since there are three sets there, i need to attribute column so i can use in partition clause but i am not getting any ideas for that?

WITH CTE AS (
SELECT CASE WHEN VALUE>5 THEN 'ON' ELSE 'OFF' END STATUS , TIMESTAMP, VALUE
FROM TIMESERIES)
SELECT ROW_NUMBER() OVER(PARTITION BY STATUS ORDER BY TIMESTAMP) RN,TIMESTAMP,VALUE FROM CTE
ORDER BY TIMESTAMP;

this is giving me row_number for all &gt;5 but I need based on sets of data..
any help to start or a solution is highly appreciated.
BDW, trying this in snowflake but general SQL is also fine.
</details>
# 答案1
**得分**: 1
标准间隔和岛屿:
```sql
with flagged as (
select *,
case when
lag(val, 1, 0.0) over (order by ts) <= 5
and val > 5 then 1 end as flag
from T
), grouped as (
select *,
count(flag) over (order by ts) as grp
from flagged
)
select
min(ts),
max(case when val > 5 then ts end)
from grouped
group by grp
having max(val) > 5
order by grp;
```
https://dbfiddle.uk/aGE2n6nk
<details>
<summary>英文:</summary>
Standard gaps and islands:
with flagged as (
select *,
case when
lag(val, 1, 0.0) over (order by ts) &lt;= 5
and val &gt; 5 then 1 end as flag
from T
), grouped as (
select *,
count(flag) over (order by ts) as grp
from flagged
)
select
min(ts),
max(case when val &gt; 5 then ts end)
from grouped
group by grp
having max(val) &gt; 5
order by grp;
https://dbfiddle.uk/aGE2n6nk
</details>
# 答案2
**得分**: 0
以下是翻译好的代码部分:
```sql
使用 Snowflake 的半结构化功能(数组)提供的解决方案:
with ranges as (
select 
case when lag(value,1,0.0) over (order by timestamp) &lt; 5 and value &gt;=5 then min(timestamp) end st_time,
case when lead(value,1,0.0) over (order by timestamp) &lt; 5 and value &gt;=5 then max(timestamp) end end_time
from timeseries
group by value, timestamp),
ranges_arrays as (
select 
array_agg(st_time) st_time_arr,
array_agg(end_time) end_time_arr,
array_size(st_time_arr) row_count,
array_generate_range( 0 , row_count ) idx_arr
from  ranges)
select 
st_time_arr[idx]::timestamp st_time, 
end_time_arr[idx]::timestamp end_time 
from ranges_arrays, (select value::integer idx from ranges_arrays, lateral flatten(input=&gt;idx_arr))
;
```
<details>
<summary>英文:</summary>
Heres a solution using Snowflakes semi-structured functionality (arrays):
with ranges as (
select 
case when lag(value,1,0.0) over (order by timestamp) &lt; 5 and value &gt;=5 then min(timestamp) end st_time,
case when lead(value,1,0.0) over (order by timestamp) &lt; 5 and value &gt;=5 then max(timestamp) end end_time
from timeseries
group by value, timestamp),
ranges_arrays as (
select 
array_agg(st_time) st_time_arr,
array_agg(end_time) end_time_arr,
array_size(st_time_arr) row_count,
array_generate_range( 0 , row_count ) idx_arr
from  ranges)
select 
st_time_arr[idx]::timestamp st_time, 
end_time_arr[idx]::timestamp end_time 
from ranges_arrays, (select value::integer idx from ranges_arrays, lateral flatten(input=&gt;idx_arr))
;  
</details>
# 答案3
**得分**: 0
Here is the translated code:
```sql
如果我们使用shawnt00提供的良好VALUES,在CTE中:
with T(ts, val) as (
select * from values
('2022-01-01 00:00:00'::datetime, 0.89),
('2022-01-01 10:01:00'::datetime, 6.89),
('2022-01-02 10:01:21'::datetime, 10.99),
('2022-01-02 10:07:00'::datetime, 11.89),
('2022-01-02 12:01:00'::datetime, 0.89),
('2022-01-02 13:07:00'::datetime, 6.39),
('2022-01-02 14:00:00'::datetime, 0.69),
('2022-01-03 14:02:00'::datetime, 5.39),
('2022-01-03 15:04:00'::datetime, 6.89),
('2022-01-03 15:06:00'::datetime, 7.3),
('2022-01-03 15:10:00'::datetime, 1.89),
('2022-01-03 15:50:00'::datetime, 0.8)
)
然后,我们可以使用[CONDITIONAL_CHANGE_EVENT][1]来查找条件“大于5”何时切换状态:
select 
ts, 
val > 5 as gt_five,
CONDITIONAL_CHANGE_EVENT(gt_five) over(order by ts) as ct
from T
这将产生以下结果:
TS| GT_FIVE| CT
--|--|--
2022-01-01 00:00:00.000| FALSE| 0
2022-01-01 10:01:00.000| TRUE| 1
2022-01-02 10:01:21.000| TRUE| 1
2022-01-02 10:07:00.000| TRUE| 1
2022-01-02 12:01:00.000| FALSE| 2
2022-01-02 13:07:00.000| TRUE| 3
2022-01-02 14:00:00.000| FALSE| 4
2022-01-03 14:02:00.000| TRUE| 5
2022-01-03 15:04:00.000| TRUE| 5
2022-01-03 15:06:00.000| TRUE| 5
2022-01-03 15:10:00.000| FALSE| 6
2022-01-03 15:50:00.000| FALSE| 6
我们可以使用`gt_five`来过滤掉不需要的间隙,然后通过CONDITIONAL_CHANGE_EVENT的输出进行分组以获取最小/最大值。因此,SQL可以是:
WITH flagged_and_stripped as (
select 
ts, 
val > 5 as gt_five,
CONDITIONAL_CHANGE_EVENT(gt_five) over(order by ts) as ct
from T
QUALIFY gt_five
)
select
min(ts) as min
,max(ts) as max
from flagged_and_stripped
group by ct
order by 1;
产生以下结果:
MIN|MAX
--|--
2022-01-01 10:01:00.000| 2022-01-02 10:07:00.000
2022-01-02 13:07:00.000| 2022-01-02 13:07:00.000
2022-01-03 14:02:00.000| 2022-01-03 15:06:00.000
因此,复杂示例代码如下:
```sql
with  T(ts, val) as (
select * from values
('2022-01-01 00:00:00'::datetime, 0.89),
('2022-01-01 10:01:00'::datetime, 6.89),
('2022-01-02 10:01:21'::datetime, 10.99),
('2022-01-02 10:07:00'::datetime, 11.89),
('2022-01-02 12:01:00'::datetime, 0.89),
('2022-01-02 13:07:00'::datetime, 6.39),
('2022-01-02 14:00:00'::datetime, 0.69),
('2022-01-03 14:02:00'::datetime, 5.39),
('2022-01-03 15:04:00'::datetime, 6.89),
('2022-01-03 15:06:00'::datetime, 7.3),
('2022-01-03 15:10:00'::datetime, 1.89),
('2022-01-03 15:50:00'::datetime, 0.8)
), flagged_and_stripped as (
select 
ts, 
val > 5 as gt_five,
CONDITIONAL_CHANGE_EVENT(gt_five) over(order by ts) as ct
from T
QUALIFY gt_five
)
select
min(ts) as min
,max(ts) as max
from flagged_and_stripped
group by ct
order by 1;
```
[1]: https://docs.snowflake.com/en/sql-reference/functions/conditional_change_event
```
<details>
<summary>英文:</summary>
If we use the nice VALUES provided by shawnt00, in a CTE
```
with  T(ts, val) as (
select * from values
(&#39;2022-01-01 00:00:00&#39;::datetime, 0.89),
(&#39;2022-01-01 10:01:00&#39;::datetime, 6.89),
(&#39;2022-01-02 10:01:21&#39;::datetime, 10.99),
(&#39;2022-01-02 10:07:00&#39;::datetime, 11.89),
(&#39;2022-01-02 12:01:00&#39;::datetime, 0.89),
(&#39;2022-01-02 13:07:00&#39;::datetime, 6.39),
(&#39;2022-01-02 14:00:00&#39;::datetime, 0.69),
(&#39;2022-01-03 14:02:00&#39;::datetime, 5.39),
(&#39;2022-01-03 15:04:00&#39;::datetime, 6.89),
(&#39;2022-01-03 15:06:00&#39;::datetime, 7.3),
(&#39;2022-01-03 15:10:00&#39;::datetime, 1.89),
(&#39;2022-01-03 15:50:00&#39;::datetime, 0.8)
)
```
we can then use [CONDITIONAL_CHANGE_EVENT][1] to find when the conditional &quot;greater than 5&quot; swaps state:
```
select 
ts, 
val &gt; 5 as gt_five,
CONDITIONAL_CHANGE_EVENT(gt_five) over(order by ts) as ct
from T
```
this gives:
TS| GT_FIVE| CT
--|--|--
2022-01-01 00:00:00.000| FALSE| 0
2022-01-01 10:01:00.000| TRUE| 1
2022-01-02 10:01:21.000| TRUE| 1
2022-01-02 10:07:00.000| TRUE| 1
2022-01-02 12:01:00.000| FALSE| 2
2022-01-02 13:07:00.000| TRUE| 3
2022-01-02 14:00:00.000| FALSE| 4
2022-01-03 14:02:00.000| TRUE| 5
2022-01-03 15:04:00.000| TRUE| 5
2022-01-03 15:06:00.000| TRUE| 5
2022-01-03 15:10:00.000| FALSE| 6
2022-01-03 15:50:00.000| FALSE| 6
we can use the `gt_five` to filter out the unwanted gaps with a QUALIFY, and then group via the output of the CONDITIONAL_CHANGE_EVENT to get the min/max. Thus the sql can be:
```
WITH flagged_and_stripped as (
select 
ts, 
val &gt; 5 as gt_five,
CONDITIONAL_CHANGE_EVENT(gt_five) over(order by ts) as ct
from T
QUALIFY gt_five
)
select
min(ts) as min
,max(ts) as max
from flagged_and_stripped
group by ct
order by 1;
```
giving:
MIN	|MAX
--|--
2022-01-01 10:01:00.000|	2022-01-02 10:07:00.000
2022-01-02 13:07:00.000	|2022-01-02 13:07:00.000
2022-01-03 14:02:00.000|	2022-01-03 15:06:00.000
thus the complex example code is:
```
with  T(ts, val) as (
select * from values
(&#39;2022-01-01 00:00:00&#39;::datetime, 0.89),
(&#39;2022-01-01 10:01:00&#39;::datetime, 6.89),
(&#39;2022-01-02 10:01:21&#39;::datetime, 10.99),
(&#39;2022-01-02 10:07:00&#39;::datetime, 11.89),
(&#39;2022-01-02 12:01:00&#39;::datetime, 0.89),
(&#39;2022-01-02 13:07:00&#39;::datetime, 6.39),
(&#39;2022-01-02 14:00:00&#39;::datetime, 0.69),
(&#39;2022-01-03 14:02:00&#39;::datetime, 5.39),
(&#39;2022-01-03 15:04:00&#39;::datetime, 6.89),
(&#39;2022-01-03 15:06:00&#39;::datetime, 7.3),
(&#39;2022-01-03 15:10:00&#39;::datetime, 1.89),
(&#39;2022-01-03 15:50:00&#39;::datetime, 0.8)
), flagged_and_stripped as (
select 
ts, 
val &gt; 5 as gt_five,
CONDITIONAL_CHANGE_EVENT(gt_five) over(order by ts) as ct
from T
QUALIFY gt_five
)
select
min(ts) as min
,max(ts) as max
from flagged_and_stripped
group by ct
order by 1;
```
[1]: https://docs.snowflake.com/en/sql-reference/functions/conditional_change_event
</details>

huangapple
  • 本文由 发表于 2023年7月13日 15:32:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/76676939.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定