英文:
Moving AVG over variable number of rows
问题
在Snowflake中,我想为我的数据计算一个移动平均值,但平均值的行数可以变化。具体来说,我想将在当前行日期之前30天或更少的任何记录纳入平均值计算。但每个日期可能有多条记录,或者有缺失的日期。
例如:
日期 | 值 |
---|---|
1月1日 | 100 |
1月5日 | 200 |
1月20日 | 100 |
2月3日 | 0 |
2月3日 | 500 |
2月10日 | 400 |
3月8日 | 600 |
我希望输出为:
日期 | 30天平均值 | 注释 |
---|---|---|
1月1日 | 100 | (计算为 AVG(100) ) |
1月5日 | 150 | (计算为 AVG(100, 200) ) |
1月20日 | 133 | (计算为 AVG(100, 200, 100) ) |
2月3日 | 200 | (计算为 AVG(200, 100, 0, 500) ) |
2月10日 | 250 | (计算为 AVG(100, 0, 500, 400) ) |
3月8日 | 500 | (计算为 AVG(400, 600) ) |
在这里似乎不能使用窗口框架,因为每次输入到框架中的行数可能会变化(取决于30天前有多少行)。
我应该怎么做?
英文:
In Snowflake, I would like to calculate a moving average for my data, but the number of rows in the average can change. Specifically, I want to factor into the average any records whose date is 30 days or less prior to the date in the current row. But there could be multiple records for each date, or missing dates.
For example:
Date | Value |
---|---|
Jan 1 | 100 |
Jan 5 | 200 |
Jan 20 | 100 |
Feb 3 | 0 |
Feb 3 | 500 |
Feb 10 | 400 |
Mar 8 | 600 |
I want the output to be:
Date | 30d Avg | Note |
---|---|---|
Jan 1 | 100 | (calculated as AVG(100) ) |
Jan 5 | 150 | (calculated as AVG(100, 200) ) |
Jan 20 | 133 | (calculated as AVG(100, 200, 100) ) |
Feb 3 | 200 | (calculated as AVG(200, 100, 0, 500) ) |
Feb 10 | 250 | (calculated as AVG(100, 0, 500, 400) ) |
Mar 8 | 500 | (calculated as AVG(400, 600) ) |
Using window frames doesn't seem to work here, because the number of inputs to the frame can change each time (depending on how many rows are less than 30 days before this one).
How can I do this?
答案1
得分: 2
在大多数数据库上,可以使用窗口函数 AVG
以窗口帧 RANGE BETWEEN INTERVAL
来执行此操作:
SELECT Date, MAX(average) AS '30d_avg'
FROM (
SELECT Date, AVG(Value) OVER (ORDER BY Date RANGE BETWEEN INTERVAL '30' DAY PRECEDING AND CURRENT ROW) AS average
FROM mytable
) AS s
GROUP BY Date
由于Snowflake不支持RANGE BETWEEN INTERVAL
,因此可以使用以下方法:
SELECT
t1.Date,
AVG(t2.Value) AS 30d_avg
FROM mytable AS t1
INNER JOIN mytable AS t2 ON t2.Date >= DATEADD(DAY, -30, t1.Date)
AND t2.Date <= t1.Date
GROUP BY t1.Date
ORDER BY t1.Date
在MySQL上进行的演示:https://dbfiddle.uk/miO-qNsf
英文:
On most databases, this can be done by using the window function AVG
with window frame RANGE BETWEEN INTERVAL
:
SELECT Date, MAX(average) AS '30d_avg'
FROM (
SELECT Date, AVG(Value) over(ORDER BY Date RANGE BETWEEN INTERVAL '30' DAY PRECEDING AND CURRENT ROW) as average
FROM mytable
) AS s
GROUP BY Date
Since Snowflake does not supportRANGE BETWEEN INTERVAL
this is a workaround :
SELECT
t1.Date,
AVG(t2.Value) AS 30d_avg
FROM mytable AS t1
INNER JOIN mytable AS t2 ON t2.Date >= DATEADD(DAY, -30, t1.Date)
AND t2.Date <= t1.Date
GROUP BY t1.Date
ORDER BY t1.Date
Demo tested on mysql : https://dbfiddle.uk/miO-qNsf
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论