Azure Stream Analytics- Window Function to aggregate current event record referencing data from 12 AM UTC

huangapple go评论76阅读模式
英文:

Azure Stream Analytics- Window Function to aggregate current event record referencing data from 12 AM UTC

问题

我有一个情景,我需要从流作业中仅汇总最新的事件(键),并且需要从每天的UTC时间12点开始的数据。似乎我不能仅使用跳跃窗口和滑动窗口,因为它们会在每次从12 AM开始直到现在都会汇总所有的记录和键组合。但我只想汇总当前的键,并引用从UTC时间12点开始的相同旧键数据。

示例:

事件中心中的数据(直到上午10:59):
1,100,上午5点
2,50,上午8点
3,60,上午10点

当前记录在上午11点:
2,50,上午11点

期望的输出:
1,100,上午5点
2,150,上午11点
3,60,上午10点

我不希望流作业重新对旧键1、3执行聚合操作。

有一种称为窗口(Windows)的东西(https://learn.microsoft.com/en-us/stream-analytics-query/windows-azure-stream-analytics),我们可以在同一查询中使用不同的窗口函数,如Hopping、Sliding、Tumbling。滑动窗口和滚动窗口的组合是否可以解决问题?因为滚动窗口将始终与滑动窗口一起汇总最新数据,我有来自UTC时间12点的参考数据。非常感谢任何帮助。

英文:

I have a scenario where I need to aggregate only the latest events (keys) from the stream job with data from 12 a.m. UTC every day. It looks like I can't go with only hopping and sliding windows as they aggregate all of the records and key combinations from 12 AM every time until now. But I just wanted to aggregate current key referencing same older key data from 12 a.m. UTC.

Example:

Data in the event hub (until 10:59 AM):
1, 100, 5AM
2, 50, 8AM
3, 60, 10AM

Current Record at 11AM
2, 50, 11AM

Expected output
1, 100, 5AM
2, 150, 11AM
3, 60, 10AM

I don't want the stream job to re-execute aggregation for older keys 1, 3

There is something called Windows (https://learn.microsoft.com/en-us/stream-analytics-query/windows-azure-stream-analytics) where we can use different Window Functions like Hopping, Sliding, Tumbling in the same query. Can a combination of sliding and tumbling windows solve the problem? As the tumbling window will always aggregate the latest data with the sliding window, I have reference data from 12 a.m. UTC. Any help is really appreciated

答案1

得分: 0

我能够在过去24小时的滑动窗口和在"where"子句中12点之前的筛选数据的帮助下解决了这个问题。

with cte A as (
select count(1),dt from table group by slidingwindow(hour,24), dt)
select 
* from cte where dt=System.TimeStamp()
英文:

I was able to resolve the issue with the help of the last 24-hour Sliding Window and filtered data before 12 AM in the where clause.

with cte A as (
select count(1),dt from table group by slidingwindow(hour,24), dt)
select 
* from cte where dt=System.TimeStamp()

huangapple
  • 本文由 发表于 2023年6月8日 12:07:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/76428550.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定