移动平均值,涵盖可变数量的行。

huangapple go评论66阅读模式
英文:

Moving AVG over variable number of rows

问题

在Snowflake中,我想为我的数据计算一个移动平均值,但平均值的行数可以变化。具体来说,我想将在当前行日期之前30天或更少的任何记录纳入平均值计算。但每个日期可能有多条记录,或者有缺失的日期。

例如:

日期
1月1日 100
1月5日 200
1月20日 100
2月3日 0
2月3日 500
2月10日 400
3月8日 600

我希望输出为:

日期 30天平均值 注释
1月1日 100 (计算为 AVG(100))
1月5日 150 (计算为 AVG(100, 200))
1月20日 133 (计算为 AVG(100, 200, 100))
2月3日 200 (计算为 AVG(200, 100, 0, 500))
2月10日 250 (计算为 AVG(100, 0, 500, 400))
3月8日 500 (计算为 AVG(400, 600))

在这里似乎不能使用窗口框架,因为每次输入到框架中的行数可能会变化(取决于30天前有多少行)。

我应该怎么做?

英文:

In Snowflake, I would like to calculate a moving average for my data, but the number of rows in the average can change. Specifically, I want to factor into the average any records whose date is 30 days or less prior to the date in the current row. But there could be multiple records for each date, or missing dates.

For example:

Date Value
Jan 1 100
Jan 5 200
Jan 20 100
Feb 3 0
Feb 3 500
Feb 10 400
Mar 8 600

I want the output to be:

Date 30d Avg Note
Jan 1 100 (calculated as AVG(100))
Jan 5 150 (calculated as AVG(100, 200))
Jan 20 133 (calculated as AVG(100, 200, 100))
Feb 3 200 (calculated as AVG(200, 100, 0, 500))
Feb 10 250 (calculated as AVG(100, 0, 500, 400))
Mar 8 500 (calculated as AVG(400, 600))

Using window frames doesn't seem to work here, because the number of inputs to the frame can change each time (depending on how many rows are less than 30 days before this one).

How can I do this?

答案1

得分: 2

在大多数数据库上,可以使用窗口函数 AVG 以窗口帧 RANGE BETWEEN INTERVAL 来执行此操作:

SELECT Date, MAX(average) AS '30d_avg'
FROM (
  SELECT Date, AVG(Value) OVER (ORDER BY Date RANGE BETWEEN INTERVAL '30' DAY PRECEDING AND CURRENT ROW) AS average
  FROM mytable
) AS s
GROUP BY Date

由于Snowflake不支持RANGE BETWEEN INTERVAL,因此可以使用以下方法:

SELECT 
   t1.Date,
   AVG(t2.Value) AS 30d_avg
FROM mytable AS t1
INNER JOIN mytable AS t2 ON t2.Date >= DATEADD(DAY, -30, t1.Date)  
                         AND t2.Date <= t1.Date
GROUP BY t1.Date
ORDER BY t1.Date

在MySQL上进行的演示:https://dbfiddle.uk/miO-qNsf

英文:

On most databases, this can be done by using the window function AVG with window frame RANGE BETWEEN INTERVAL:

SELECT Date, MAX(average) AS &#39;30d_avg&#39;
FROM (
  SELECT Date, AVG(Value) over(ORDER BY Date RANGE BETWEEN INTERVAL &#39;30&#39; DAY PRECEDING AND CURRENT ROW) as average
  FROM mytable
) AS s
GROUP BY Date

Since Snowflake does not supportRANGE BETWEEN INTERVAL this is a workaround :

SELECT 
   t1.Date,
   AVG(t2.Value) AS 30d_avg
FROM mytable AS t1
INNER JOIN mytable AS t2 ON t2.Date &gt;=  DATEADD(DAY, -30, t1.Date)  
                         AND t2.Date &lt;= t1.Date
GROUP BY t1.Date
ORDER BY t1.Date

Demo tested on mysql : https://dbfiddle.uk/miO-qNsf

huangapple
  • 本文由 发表于 2023年6月11日 20:53:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/76450557.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定