英文:
SQL window function to compare values by date
问题
我有一个包含日期列和值列的表格。
日期 | 值 |
---|---|
2023年2月28日 | 120 |
2023年1月31日 | 127.2 |
2023年1月1日 | 100 |
2022年4月5日 | 110 |
我想创建两列,根据日期进行值的比较(过去30天的变化和过去60天的变化)。
以下是输出结果:
日期 | 值 | 过去30天的变化 | 过去60天的变化 |
---|---|---|---|
2023年2月28日 | 120 | -6% | 20% |
2023年1月31日 | 127.2 | 27% | |
2023年1月1日 | 100 | ||
2022年4月5日 | 110 |
因此,输出表格比较了2023年2月28日的值与2023年1月31日(30天内)和2023年1月1日(60天内)的值。我们比较了2023年1月31日的值与2023年1月1日的值,这是在30天内的。
你可以在SQL中如何实现这个呢?
谢谢。
英文:
I have a table with a Date column and a value column.
Date | Value |
---|---|
2/28/2023 | 120 |
1/31/2023 | 127.2 |
1/1/2023 | 100 |
4/5/2022 | 110 |
I want to create two more columns that compare each value based on dates (30 day change and 60 day change)
Here is the output:
Date | Value | Change in last 30 days | Change in last 60 days |
---|---|---|---|
2/28/2023 | 120 | -6% | 20% |
1/31/2023 | 127.2 | 27% | |
1/1/2023 | 100 | ||
4/5/2022 | 110 |
So, the output table compares value on 2/28/2023 with the value on 1/31/2023 (within 30 days) and 1/1/2023 (60 days). We compare 1/31/2023 value with 1/1/2023 value which is within 30 days.
How can I do this in SQL?
Thank you.
答案1
得分: 2
我不知道你是否可以使用窗口函数来选择基于相对日期范围的数据,但你可以使用一对 OUTER APPLY(SELECT TOP 1 ...)
结构来选择适用于计算的过去30天和60天的值。
一个不确定的地方是关于30天和60天过去值的定义,如果没有确切的30天或60天前的记录存在。一些选项包括:
- 最近的值至少N天前的值。
- 最老的值至多N天前的值。
- 上述任一情况,同时对范围设置附加限制。
对于以下内容,我选择了一种看似与提问者所期望的结果相匹配的解决方案。
虽然结果格式化通常最好留给展示层处理,但我使用了 FORMAT( ..., 'P0')
来以百分比格式显示结果。(这是从zhiguang的回答中适应过来的。)
SELECT D.*
, FORMAT(D.Value / NULLIF(D30.Value, 0) - 1, 'P0') AS [过去30天内的变化]
, FORMAT(D.Value / NULLIF(D60.Value, 0) - 1, 'P0') AS [过去60天内的变化]
FROM Data D
OUTER APPLY (
SELECT TOP 1 D1.Value
FROM Data D1
WHERE D1.Date < D.Date
AND D1.Date >= DATEADD(day, -30, D.Date)
ORDER BY D1.Date
) D30
OUTER APPLY (
SELECT TOP 1 D2.Value
FROM Data D2
WHERE D2.Date < DATEADD(day, -30, D.Date)
AND D2.Date >= DATEADD(day, -60, D.Date)
ORDER BY D2.Date
) D60
OUTER APPLY
类似于左连接到子查询,可以包含对主查询早期部分的引用。(CROSS APPLY
类似,但行为类似于内连接。)在这里使用 OUTER APPLY
允许没有匹配条件的情况。
上述代码还使用了 NULLIF()
函数来防止潜在的除零错误。
请参见此db<>fiddle中的演示,其中还包括在同一30天窗口内具有多个值的附加数据。
请注意,一些选择的过去值可能在你认识到某些月份的长度不是30天后才显得明显正确。将日期范围计算更改为使用月份偏移而不是天数偏移可能会使结果更直观。
示例结果:
日期 | 值 | 过去30天内的变化 | 过去60天内的变化 | D30_Value | D60_Value |
---|---|---|---|---|---|
2023-02-28 | 120.00 | -6% | 20% | 127.20 | 100.00 |
2023-01-31 | 127.20 | 27% | null | 100.00 | null |
2023-01-01 | 100.00 | null | null | null | null |
2022-04-05 | 110.00 | null | null | null | null |
2021-04-21 | 210.00 | 11% | 40% | 190.00 | 150.00 |
2021-04-11 | 200.00 | 11% | 43% | 180.00 | 140.00 |
2021-04-01 | 190.00 | 12% | 46% | 170.00 | 130.00 |
2021-03-21 | 180.00 | 20% | 50% | 150.00 | 120.00 |
2021-03-11 | 170.00 | 21% | 55% | 140.00 | 110.00 |
2021-03-01 | 160.00 | 23% | 60% | 130.00 | 100.00 |
2021-02-21 | 150.00 | 15% | 50% | 130.00 | 100.00 |
2021-02-11 | 140.00 | 17% | 40% | 120.00 | 100.00 |
2021-02-01 | 130.00 | 18% | 30% | 110.00 | 100.00 |
2021-01-21 | 120.00 | 20% | null | 100.00 | null |
2021-01-11 | 110.00 | 10% | null | 100.00 | null |
2021-01-01 | 100.00 | null | null | null | null |
我将调整日期范围选择逻辑和最终结果的四舍五入和格式设置交给提问者。
英文:
I don't know if you can use a window function to select based on a relative date range, but you can use a pair of OUTER APPLY(SELECT TOP 1 ...)
constructs to select appropriate 30-day and 60-day past values for the calculation.
One ambiguity is the definition of the 30-day and 60-day past value, if no exact 30- or 60-day past entry is present. Some options are:
- Most recent value at least N days old.
- Oldest value at most N days old.
- Either of the above with additional limits on the range.
For the following, I chose a solution that seemed to match the OP's desired results.
Although result formatting is often best left to the presentation layer, I've used FORMAT( ..., 'P0')
to display the results in a percent format. (This was adapted from zhiguang's answer.)
SELECT D.*
, FORMAT(D.Value / NULLIF(D30.Value, 0) - 1, 'P0') AS [%Change in last 30 days]
, FORMAT(D.Value / NULLIF(D60.Value, 0) - 1, 'P0') AS [%Change in last 60 days]
FROM Data D
OUTER APPLY (
SELECT TOP 1 D1.Value
FROM Data D1
WHERE D1.Date < D.Date
AND D1.Date >= DATEADD(day, -30, D.Date)
ORDER BY D1.Date
) D30
OUTER APPLY (
SELECT TOP 1 D2.Value
FROM Data D2
WHERE D2.Date < DATEADD(day, -30, D.Date)
AND D2.Date >= DATEADD(day, -60, D.Date)
ORDER BY D2.Date
) D60
An OUTER APPLY
is like a left join to a subselect that can include references back to earlier parts of the main query. (A CROSS APPLY
is similar, but behaves like an inner join.) The OUTER APPLY
is used here to allow for no match conditions.
The above also uses the NULLIF()
function to protect against potential divide-by-zero errors.
See this db<>fiddle for a working demo that also includes additional data having multiple values within the same 30-day window.
Note that some of the selected past values might not seem obviously correct until you recognize that some months have lengths other than 30-days. Changing the date range calculations to use month offsets instead of day offsets might make the results more intuitive.
Sample results:
Date | Value | %Change in last 30 days | %Change in last 60 days | D30_Value | D60_Value |
---|---|---|---|---|---|
2023-02-28 | 120.00 | -6% | 20% | 127.20 | 100.00 |
2023-01-31 | 127.20 | 27% | null | 100.00 | null |
2023-01-01 | 100.00 | null | null | null | null |
2022-04-05 | 110.00 | null | null | null | null |
2021-04-21 | 210.00 | 11% | 40% | 190.00 | 150.00 |
2021-04-11 | 200.00 | 11% | 43% | 180.00 | 140.00 |
2021-04-01 | 190.00 | 12% | 46% | 170.00 | 130.00 |
2021-03-21 | 180.00 | 20% | 50% | 150.00 | 120.00 |
2021-03-11 | 170.00 | 21% | 55% | 140.00 | 110.00 |
2021-03-01 | 160.00 | 23% | 60% | 130.00 | 100.00 |
2021-02-21 | 150.00 | 15% | 50% | 130.00 | 100.00 |
2021-02-11 | 140.00 | 17% | 40% | 120.00 | 100.00 |
2021-02-01 | 130.00 | 18% | 30% | 110.00 | 100.00 |
2021-01-21 | 120.00 | 20% | null | 100.00 | null |
2021-01-11 | 110.00 | 10% | null | 100.00 | null |
2021-01-01 | 100.00 | null | null | null | null |
I'll leave it to the OP to tweak the date range selection logic and the final result rounding and formatting.
答案2
得分: 0
这里有一个可能有用的模式。收集最近的数值,然后确定您想要保留哪些。如果您在可能的回溯行数上有一个上限,那么您可以使用这种方法,并调整为您喜欢的选项。
在这个示例中,我基本上假设可能会有间隙,但每30天只会有一个单一的值。虽然这可能不适用于您的数据,但应该很容易合并额外的条件:
with data as (
select *,
lag("date", 1) over (order by "date") as d1,
lag("date", 2) over (order by "date") as d2,
lag(value, 1) over (order by "date") as v1,
lag(value, 2) over (order by "date") as v2
from T
)
select *,
(case when datediff(day, d1, "date") between 1 and 30 then v1 end - value) / value as "30daychange",
(case when datediff(day, d1, "date") between 31 and 60 then v1
when datediff(day, d2, "date") between 31 and 60 then v2 end - value) / value as "60daychange"
from data;
不需要联接操作,只涉及对日期列进行排序,因此这将非常高效。
英文:
Here's a pattern that might be useful. Collect the nearest values and then determine which you want to keep. If you have a cap in the number of possible lookback rows then you can use this method and adjust for whichever option you like.
In this example I'm basically assuming that there might be gaps but that there will only be a single value per 30 days. While that might not apply to your data it should be easy to incorporate the extra conditions:
with data as (
select *,
lag("date", 1) over (order by "date") as d1,
lag("date", 2) over (order by "date") as d2,
lag(value, 1) over (order by "date") as v1,
lag(value, 2) over (order by "date") as v2
from T
)
select *,
(case when datediff(day, d1, "date") between 1 and 30 then v1 end - value) / value as "30daychange",
(case when datediff(day, d1, "date") between 31 and 60 then v1
when datediff(day, d2, "date") between 31 and 60 then v2 end - value) / value as "60daychange"
from data;
No joins are required and it only involves a sort on the date column so this will be very efficient.
答案3
得分: 0
- 假设源数据中每个日期最多只有一行
- 为任何缺失的日期生成行
- 使用
FIRST_VALUE
与IGNORE NULLS
来查找目标范围中第一个具有数据的日期的值
以下使用了一些 SQL Server 2022 特定的函数,所以如果你不是在那个版本上,你需要找到替代方案。
WITH DateLimits AS
(
SELECT MIN(Date) AS MinDate, DATEDIFF(DAY, MIN(Date), MAX(Date)) AS RangeSize
FROM YourTable
), Expanded AS
(
SELECT D.Date,
Y.Value,
Within30 = FIRST_VALUE(Y.Value) IGNORE NULLS OVER (ORDER BY D.Date ROWS BETWEEN 30 PRECEDING AND 1 PRECEDING) ,
Within60 = FIRST_VALUE(Y.Value) IGNORE NULLS OVER (ORDER BY D.Date ROWS BETWEEN 60 PRECEDING AND 31 PRECEDING)
FROM DateLimits
CROSS APPLY GENERATE_SERIES(0, RangeSize) G
CROSS APPLY (SELECT DATEADD(DAY, G.value, MinDate)) D(Date)
LEFT JOIN YourTable Y ON Y.Date = D.Date
)
SELECT Date,
Value,
(Value - Within30)/Within30 AS [过去30天的变化],
(Value - Within60)/Within60 AS [过去60天的变化]
FROM Expanded
WHERE Value IS NOT NULL
ORDER BY Date DESC
Note: I've translated the SQL code as requested, without any additional content.
英文:
Potentially you could also do something like the below
- assumes at most one row per date in the source data
- generates rows for any missing dates
- Use
FIRST_VALUE
withIGNORE NULLS
to find the value from the first date in the target range with data
Below uses some SQL Server 2022 specific functions so if you aren't on that version you would need to find alternatives.
WITH DateLimits AS
(
SELECT MIN(Date) AS MinDate, DATEDIFF(DAY, MIN(Date), MAX(Date)) AS RangeSize
FROM YourTable
), Expanded AS
(
SELECT D.Date,
Y.Value,
Within30 = FIRST_VALUE(Y.Value) IGNORE NULLS OVER (ORDER BY D.Date ROWS BETWEEN 30 PRECEDING AND 1 PRECEDING) ,
Within60 = FIRST_VALUE(Y.Value) IGNORE NULLS OVER (ORDER BY D.Date ROWS BETWEEN 60 PRECEDING AND 31 PRECEDING)
FROM DateLimits
CROSS APPLY GENERATE_SERIES(0, RangeSize) G
CROSS APPLY (SELECT DATEADD(DAY, G.value, MinDate)) D(Date)
LEFT JOIN YourTable Y ON Y.Date = D.Date
)
SELECT Date,
Value,
(Value - Within30)/Within30 AS [Change in last 30 days],
(Value - Within60)/Within60 AS [Change in last 60 days]
FROM Expanded
WHERE Value IS NOT NULL
ORDER BY Date DESC
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论