如何在SQL/Snowflake中从时间序列数据中获取最小值和最大值?

huangapple go评论108阅读模式
英文:

How to get min and max from a timeseries data in SQL/Snowflake?

问题

Here's the translated code portion:

  1. 我的数据如下,时间戳按升序排列。

INSERT INTO timeseries (timestamp, value)
VALUES
('2022-01-01 00:00:00', 0.89),
('2022-01-01 10:01:00', 6.89),
('2022-01-02 10:01:21', 10.99),
('2022-01-02 10:07:00', 11.89),
('2022-01-02 12:01:00', 0.89),
('2022-01-02 13:07:00', 6.39),
('2022-01-02 14:00:00', 0.69),
('2022-01-03 14:02:00', 5.39),
('2022-01-03 15:04:00', 6.89),
('2022-01-03 15:00:00', 7.3),
('2022-01-03 15:10:00', 1.89),
('2022-01-03 15:50:00', 0.8);

  1. 我要求获取最小值和最大值,以便在值超过5时生成分钟差异。在上述数据中,形成了三个数据集。

最小值 最大值
2022-01-01 10:01:00 2022-01-02 10:07:00
2022-01-02 13:07:00 2022-01-02 13:07:00
2022-01-03 14:02:00 2022-01-03 15:00:00

  1. 我尝试使用`row_number`来获取顺序,但由于有三个集合,我需要添加列以在分区子句中使用,但我对此没有任何想法?

WITH CTE AS (
SELECT CASE WHEN VALUE > 5 THEN 'ON' ELSE 'OFF' END STATUS, TIMESTAMP, VALUE
FROM TIMESERIES)
SELECT ROW_NUMBER() OVER(PARTITION BY STATUS ORDER BY TIMESTAMP) RN, TIMESTAMP, VALUE FROM CTE
ORDER BY TIMESTAMP;

  1. 这为我提供了所有大于5的行号,但我需要基于数据集来获取行号。
  2. 希望这可以帮助你开始或找到解决方案。顺便说一下,在Snowflake尝试此操作,通用SQL也适用。
  3. <details>
  4. <summary>英文:</summary>
  5. my data is something like this, timestamp is ordered in asc.

INSERT INTO timeseries (timestamp, value)
VALUES
('2022-01-01 00:00:00', 0.89),
('2022-01-01 10:01:00', 6.89),
('2022-01-02 10:01:21', 10.99),
('2022-01-02 10:07:00', 11.89),

('2022-01-02 12:01:00', 0.89),
('2022-01-02 13:07:00', 6.39),
('2022-01-02 14:00:00', 0.69),
('2022-01-03 14:02:00', 5.39),
('2022-01-03 15:04:00', 6.89),
('2022-01-03 15:00:00', 7.3),

('2022-01-03 15:10:00', 1.89),
('2022-01-03 15:50:00', 0.8);

  1. my ask is to get min and max, so i can produce difference in minutes whenever value is going above 5.
  2. in above, there are three data sets formed

min max
2022-01-01 10:01:00 2022-01-02 10:07:00
2022-01-02 13:07:00 2022-01-02 13:07:00
2022-01-03 14:02:00 2022-01-03 15:00:00

  1. I tried row_number to get sequence but since there are three sets there, i need to attribute column so i can use in partition clause but i am not getting any ideas for that?

WITH CTE AS (
SELECT CASE WHEN VALUE>5 THEN 'ON' ELSE 'OFF' END STATUS , TIMESTAMP, VALUE
FROM TIMESERIES)
SELECT ROW_NUMBER() OVER(PARTITION BY STATUS ORDER BY TIMESTAMP) RN,TIMESTAMP,VALUE FROM CTE
ORDER BY TIMESTAMP;

  1. this is giving me row_number for all &gt;5 but I need based on sets of data..
  2. any help to start or a solution is highly appreciated.
  3. BDW, trying this in snowflake but general SQL is also fine.
  4. </details>
  5. # 答案1
  6. **得分**: 1
  7. 标准间隔和岛屿:
  8. ```sql
  9. with flagged as (
  10. select *,
  11. case when
  12. lag(val, 1, 0.0) over (order by ts) <= 5
  13. and val > 5 then 1 end as flag
  14. from T
  15. ), grouped as (
  16. select *,
  17. count(flag) over (order by ts) as grp
  18. from flagged
  19. )
  20. select
  21. min(ts),
  22. max(case when val > 5 then ts end)
  23. from grouped
  24. group by grp
  25. having max(val) > 5
  26. order by grp;
  27. ```
  28. https://dbfiddle.uk/aGE2n6nk
  29. <details>
  30. <summary>英文:</summary>
  31. Standard gaps and islands:
  32. with flagged as (
  33. select *,
  34. case when
  35. lag(val, 1, 0.0) over (order by ts) &lt;= 5
  36. and val &gt; 5 then 1 end as flag
  37. from T
  38. ), grouped as (
  39. select *,
  40. count(flag) over (order by ts) as grp
  41. from flagged
  42. )
  43. select
  44. min(ts),
  45. max(case when val &gt; 5 then ts end)
  46. from grouped
  47. group by grp
  48. having max(val) &gt; 5
  49. order by grp;
  50. https://dbfiddle.uk/aGE2n6nk
  51. </details>
  52. # 答案2
  53. **得分**: 0
  54. 以下是翻译好的代码部分:
  55. ```sql
  56. 使用 Snowflake 的半结构化功能(数组)提供的解决方案:
  57. with ranges as (
  58. select
  59. case when lag(value,1,0.0) over (order by timestamp) &lt; 5 and value &gt;=5 then min(timestamp) end st_time,
  60. case when lead(value,1,0.0) over (order by timestamp) &lt; 5 and value &gt;=5 then max(timestamp) end end_time
  61. from timeseries
  62. group by value, timestamp),
  63. ranges_arrays as (
  64. select
  65. array_agg(st_time) st_time_arr,
  66. array_agg(end_time) end_time_arr,
  67. array_size(st_time_arr) row_count,
  68. array_generate_range( 0 , row_count ) idx_arr
  69. from ranges)
  70. select
  71. st_time_arr[idx]::timestamp st_time,
  72. end_time_arr[idx]::timestamp end_time
  73. from ranges_arrays, (select value::integer idx from ranges_arrays, lateral flatten(input=&gt;idx_arr))
  74. ;
  75. ```
  76. <details>
  77. <summary>英文:</summary>
  78. Heres a solution using Snowflakes semi-structured functionality (arrays):
  79. with ranges as (
  80. select
  81. case when lag(value,1,0.0) over (order by timestamp) &lt; 5 and value &gt;=5 then min(timestamp) end st_time,
  82. case when lead(value,1,0.0) over (order by timestamp) &lt; 5 and value &gt;=5 then max(timestamp) end end_time
  83. from timeseries
  84. group by value, timestamp),
  85. ranges_arrays as (
  86. select
  87. array_agg(st_time) st_time_arr,
  88. array_agg(end_time) end_time_arr,
  89. array_size(st_time_arr) row_count,
  90. array_generate_range( 0 , row_count ) idx_arr
  91. from ranges)
  92. select
  93. st_time_arr[idx]::timestamp st_time,
  94. end_time_arr[idx]::timestamp end_time
  95. from ranges_arrays, (select value::integer idx from ranges_arrays, lateral flatten(input=&gt;idx_arr))
  96. ;
  97. </details>
  98. # 答案3
  99. **得分**: 0
  100. Here is the translated code:
  101. ```sql
  102. 如果我们使用shawnt00提供的良好VALUES,在CTE中:
  103. with T(ts, val) as (
  104. select * from values
  105. ('2022-01-01 00:00:00'::datetime, 0.89),
  106. ('2022-01-01 10:01:00'::datetime, 6.89),
  107. ('2022-01-02 10:01:21'::datetime, 10.99),
  108. ('2022-01-02 10:07:00'::datetime, 11.89),
  109. ('2022-01-02 12:01:00'::datetime, 0.89),
  110. ('2022-01-02 13:07:00'::datetime, 6.39),
  111. ('2022-01-02 14:00:00'::datetime, 0.69),
  112. ('2022-01-03 14:02:00'::datetime, 5.39),
  113. ('2022-01-03 15:04:00'::datetime, 6.89),
  114. ('2022-01-03 15:06:00'::datetime, 7.3),
  115. ('2022-01-03 15:10:00'::datetime, 1.89),
  116. ('2022-01-03 15:50:00'::datetime, 0.8)
  117. )
  118. 然后,我们可以使用[CONDITIONAL_CHANGE_EVENT][1]来查找条件“大于5”何时切换状态:
  119. select
  120. ts,
  121. val > 5 as gt_five,
  122. CONDITIONAL_CHANGE_EVENT(gt_five) over(order by ts) as ct
  123. from T
  124. 这将产生以下结果:
  125. TS| GT_FIVE| CT
  126. --|--|--
  127. 2022-01-01 00:00:00.000| FALSE| 0
  128. 2022-01-01 10:01:00.000| TRUE| 1
  129. 2022-01-02 10:01:21.000| TRUE| 1
  130. 2022-01-02 10:07:00.000| TRUE| 1
  131. 2022-01-02 12:01:00.000| FALSE| 2
  132. 2022-01-02 13:07:00.000| TRUE| 3
  133. 2022-01-02 14:00:00.000| FALSE| 4
  134. 2022-01-03 14:02:00.000| TRUE| 5
  135. 2022-01-03 15:04:00.000| TRUE| 5
  136. 2022-01-03 15:06:00.000| TRUE| 5
  137. 2022-01-03 15:10:00.000| FALSE| 6
  138. 2022-01-03 15:50:00.000| FALSE| 6
  139. 我们可以使用`gt_five`来过滤掉不需要的间隙,然后通过CONDITIONAL_CHANGE_EVENT的输出进行分组以获取最小/最大值。因此,SQL可以是:
  140. WITH flagged_and_stripped as (
  141. select
  142. ts,
  143. val > 5 as gt_five,
  144. CONDITIONAL_CHANGE_EVENT(gt_five) over(order by ts) as ct
  145. from T
  146. QUALIFY gt_five
  147. )
  148. select
  149. min(ts) as min
  150. ,max(ts) as max
  151. from flagged_and_stripped
  152. group by ct
  153. order by 1;
  154. 产生以下结果:
  155. MIN|MAX
  156. --|--
  157. 2022-01-01 10:01:00.000| 2022-01-02 10:07:00.000
  158. 2022-01-02 13:07:00.000| 2022-01-02 13:07:00.000
  159. 2022-01-03 14:02:00.000| 2022-01-03 15:06:00.000
  160. 因此,复杂示例代码如下:
  161. ```sql
  162. with T(ts, val) as (
  163. select * from values
  164. ('2022-01-01 00:00:00'::datetime, 0.89),
  165. ('2022-01-01 10:01:00'::datetime, 6.89),
  166. ('2022-01-02 10:01:21'::datetime, 10.99),
  167. ('2022-01-02 10:07:00'::datetime, 11.89),
  168. ('2022-01-02 12:01:00'::datetime, 0.89),
  169. ('2022-01-02 13:07:00'::datetime, 6.39),
  170. ('2022-01-02 14:00:00'::datetime, 0.69),
  171. ('2022-01-03 14:02:00'::datetime, 5.39),
  172. ('2022-01-03 15:04:00'::datetime, 6.89),
  173. ('2022-01-03 15:06:00'::datetime, 7.3),
  174. ('2022-01-03 15:10:00'::datetime, 1.89),
  175. ('2022-01-03 15:50:00'::datetime, 0.8)
  176. ), flagged_and_stripped as (
  177. select
  178. ts,
  179. val > 5 as gt_five,
  180. CONDITIONAL_CHANGE_EVENT(gt_five) over(order by ts) as ct
  181. from T
  182. QUALIFY gt_five
  183. )
  184. select
  185. min(ts) as min
  186. ,max(ts) as max
  187. from flagged_and_stripped
  188. group by ct
  189. order by 1;
  190. ```
  191. [1]: https://docs.snowflake.com/en/sql-reference/functions/conditional_change_event
  192. ```
  193. <details>
  194. <summary>英文:</summary>
  195. If we use the nice VALUES provided by shawnt00, in a CTE
  196. ```
  197. with T(ts, val) as (
  198. select * from values
  199. (&#39;2022-01-01 00:00:00&#39;::datetime, 0.89),
  200. (&#39;2022-01-01 10:01:00&#39;::datetime, 6.89),
  201. (&#39;2022-01-02 10:01:21&#39;::datetime, 10.99),
  202. (&#39;2022-01-02 10:07:00&#39;::datetime, 11.89),
  203. (&#39;2022-01-02 12:01:00&#39;::datetime, 0.89),
  204. (&#39;2022-01-02 13:07:00&#39;::datetime, 6.39),
  205. (&#39;2022-01-02 14:00:00&#39;::datetime, 0.69),
  206. (&#39;2022-01-03 14:02:00&#39;::datetime, 5.39),
  207. (&#39;2022-01-03 15:04:00&#39;::datetime, 6.89),
  208. (&#39;2022-01-03 15:06:00&#39;::datetime, 7.3),
  209. (&#39;2022-01-03 15:10:00&#39;::datetime, 1.89),
  210. (&#39;2022-01-03 15:50:00&#39;::datetime, 0.8)
  211. )
  212. ```
  213. we can then use [CONDITIONAL_CHANGE_EVENT][1] to find when the conditional &quot;greater than 5&quot; swaps state:
  214. ```
  215. select
  216. ts,
  217. val &gt; 5 as gt_five,
  218. CONDITIONAL_CHANGE_EVENT(gt_five) over(order by ts) as ct
  219. from T
  220. ```
  221. this gives:
  222. TS| GT_FIVE| CT
  223. --|--|--
  224. 2022-01-01 00:00:00.000| FALSE| 0
  225. 2022-01-01 10:01:00.000| TRUE| 1
  226. 2022-01-02 10:01:21.000| TRUE| 1
  227. 2022-01-02 10:07:00.000| TRUE| 1
  228. 2022-01-02 12:01:00.000| FALSE| 2
  229. 2022-01-02 13:07:00.000| TRUE| 3
  230. 2022-01-02 14:00:00.000| FALSE| 4
  231. 2022-01-03 14:02:00.000| TRUE| 5
  232. 2022-01-03 15:04:00.000| TRUE| 5
  233. 2022-01-03 15:06:00.000| TRUE| 5
  234. 2022-01-03 15:10:00.000| FALSE| 6
  235. 2022-01-03 15:50:00.000| FALSE| 6
  236. we can use the `gt_five` to filter out the unwanted gaps with a QUALIFY, and then group via the output of the CONDITIONAL_CHANGE_EVENT to get the min/max. Thus the sql can be:
  237. ```
  238. WITH flagged_and_stripped as (
  239. select
  240. ts,
  241. val &gt; 5 as gt_five,
  242. CONDITIONAL_CHANGE_EVENT(gt_five) over(order by ts) as ct
  243. from T
  244. QUALIFY gt_five
  245. )
  246. select
  247. min(ts) as min
  248. ,max(ts) as max
  249. from flagged_and_stripped
  250. group by ct
  251. order by 1;
  252. ```
  253. giving:
  254. MIN |MAX
  255. --|--
  256. 2022-01-01 10:01:00.000| 2022-01-02 10:07:00.000
  257. 2022-01-02 13:07:00.000 |2022-01-02 13:07:00.000
  258. 2022-01-03 14:02:00.000| 2022-01-03 15:06:00.000
  259. thus the complex example code is:
  260. ```
  261. with T(ts, val) as (
  262. select * from values
  263. (&#39;2022-01-01 00:00:00&#39;::datetime, 0.89),
  264. (&#39;2022-01-01 10:01:00&#39;::datetime, 6.89),
  265. (&#39;2022-01-02 10:01:21&#39;::datetime, 10.99),
  266. (&#39;2022-01-02 10:07:00&#39;::datetime, 11.89),
  267. (&#39;2022-01-02 12:01:00&#39;::datetime, 0.89),
  268. (&#39;2022-01-02 13:07:00&#39;::datetime, 6.39),
  269. (&#39;2022-01-02 14:00:00&#39;::datetime, 0.69),
  270. (&#39;2022-01-03 14:02:00&#39;::datetime, 5.39),
  271. (&#39;2022-01-03 15:04:00&#39;::datetime, 6.89),
  272. (&#39;2022-01-03 15:06:00&#39;::datetime, 7.3),
  273. (&#39;2022-01-03 15:10:00&#39;::datetime, 1.89),
  274. (&#39;2022-01-03 15:50:00&#39;::datetime, 0.8)
  275. ), flagged_and_stripped as (
  276. select
  277. ts,
  278. val &gt; 5 as gt_five,
  279. CONDITIONAL_CHANGE_EVENT(gt_five) over(order by ts) as ct
  280. from T
  281. QUALIFY gt_five
  282. )
  283. select
  284. min(ts) as min
  285. ,max(ts) as max
  286. from flagged_and_stripped
  287. group by ct
  288. order by 1;
  289. ```
  290. [1]: https://docs.snowflake.com/en/sql-reference/functions/conditional_change_event
  291. </details>

huangapple
  • 本文由 发表于 2023年7月13日 15:32:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/76676939.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定