2023年6月2日 06:13:45go评论65阅读模式

英文:

SQL window function to compare values by date

问题

我有一个包含日期列和值列的表格。

日期	值
2023年2月28日	120
2023年1月31日	127.2
2023年1月1日	100
2022年4月5日	110

我想创建两列，根据日期进行值的比较（过去30天的变化和过去60天的变化）。

以下是输出结果：

日期	值	过去30天的变化	过去60天的变化
2023年2月28日	120	-6%	20%
2023年1月31日	127.2	27%
2023年1月1日	100
2022年4月5日	110

因此，输出表格比较了2023年2月28日的值与2023年1月31日（30天内）和2023年1月1日（60天内）的值。我们比较了2023年1月31日的值与2023年1月1日的值，这是在30天内的。

你可以在SQL中如何实现这个呢？

谢谢。

英文:

I have a table with a Date column and a value column.

Date	Value
2/28/2023	120
1/31/2023	127.2
1/1/2023	100
4/5/2022	110

I want to create two more columns that compare each value based on dates (30 day change and 60 day change)

Here is the output:

Date	Value	Change in last 30 days	Change in last 60 days
2/28/2023	120	-6%	20%
1/31/2023	127.2	27%
1/1/2023	100
4/5/2022	110

So, the output table compares value on 2/28/2023 with the value on 1/31/2023 (within 30 days) and 1/1/2023 (60 days). We compare 1/31/2023 value with 1/1/2023 value which is within 30 days.
How can I do this in SQL?

Thank you.

答案1

得分: 2

我不知道你是否可以使用窗口函数来选择基于相对日期范围的数据，但你可以使用一对 OUTER APPLY(SELECT TOP 1 ...) 结构来选择适用于计算的过去30天和60天的值。

一个不确定的地方是关于30天和60天过去值的定义，如果没有确切的30天或60天前的记录存在。一些选项包括：

最近的值至少N天前的值。
最老的值至多N天前的值。
上述任一情况，同时对范围设置附加限制。

对于以下内容，我选择了一种看似与提问者所期望的结果相匹配的解决方案。

虽然结果格式化通常最好留给展示层处理，但我使用了 FORMAT( ..., 'P0') 来以百分比格式显示结果。（这是从zhiguang的回答中适应过来的。）

SELECT D.*
    , FORMAT(D.Value / NULLIF(D30.Value, 0) - 1, 'P0') AS [过去30天内的变化]
    , FORMAT(D.Value / NULLIF(D60.Value, 0) - 1, 'P0') AS [过去60天内的变化]
FROM Data D
OUTER APPLY (
    SELECT TOP 1 D1.Value
    FROM Data D1
    WHERE D1.Date < D.Date
    AND D1.Date >= DATEADD(day, -30, D.Date)
    ORDER BY D1.Date
) D30
OUTER APPLY (
    SELECT TOP 1 D2.Value
    FROM Data D2
    WHERE D2.Date < DATEADD(day, -30, D.Date)
    AND D2.Date >= DATEADD(day, -60, D.Date)
    ORDER BY D2.Date
) D60

OUTER APPLY 类似于左连接到子查询，可以包含对主查询早期部分的引用。（CROSS APPLY 类似，但行为类似于内连接。）在这里使用 OUTER APPLY 允许没有匹配条件的情况。

上述代码还使用了 NULLIF() 函数来防止潜在的除零错误。

请参见此db<>fiddle中的演示，其中还包括在同一30天窗口内具有多个值的附加数据。

请注意，一些选择的过去值可能在你认识到某些月份的长度不是30天后才显得明显正确。将日期范围计算更改为使用月份偏移而不是天数偏移可能会使结果更直观。

示例结果：

日期	值	过去30天内的变化	过去60天内的变化	D30_Value	D60_Value
2023-02-28	120.00	-6%	20%	127.20	100.00
2023-01-31	127.20	27%	null	100.00	null
2023-01-01	100.00	null	null	null	null
2022-04-05	110.00	null	null	null	null
2021-04-21	210.00	11%	40%	190.00	150.00
2021-04-11	200.00	11%	43%	180.00	140.00
2021-04-01	190.00	12%	46%	170.00	130.00
2021-03-21	180.00	20%	50%	150.00	120.00
2021-03-11	170.00	21%	55%	140.00	110.00
2021-03-01	160.00	23%	60%	130.00	100.00
2021-02-21	150.00	15%	50%	130.00	100.00
2021-02-11	140.00	17%	40%	120.00	100.00
2021-02-01	130.00	18%	30%	110.00	100.00
2021-01-21	120.00	20%	null	100.00	null
2021-01-11	110.00	10%	null	100.00	null
2021-01-01	100.00	null	null	null	null

我将调整日期范围选择逻辑和最终结果的四舍五入和格式设置交给提问者。

英文:

I don't know if you can use a window function to select based on a relative date range, but you can use a pair of OUTER APPLY(SELECT TOP 1 ...) constructs to select appropriate 30-day and 60-day past values for the calculation.

One ambiguity is the definition of the 30-day and 60-day past value, if no exact 30- or 60-day past entry is present. Some options are:

Most recent value at least N days old.
Oldest value at most N days old.
Either of the above with additional limits on the range.

For the following, I chose a solution that seemed to match the OP's desired results.

Although result formatting is often best left to the presentation layer, I've used FORMAT( ..., 'P0') to display the results in a percent format. (This was adapted from zhiguang's answer.)

SELECT D.*
    , FORMAT(D.Value / NULLIF(D30.Value, 0) - 1, &#39;P0&#39;) AS [%Change in last 30 days]
    , FORMAT(D.Value / NULLIF(D60.Value, 0) - 1, &#39;P0&#39;) AS [%Change in last 60 days]
FROM Data D
OUTER APPLY (
    SELECT TOP 1 D1.Value
    FROM Data D1
    WHERE D1.Date &lt; D.Date
    AND D1.Date &gt;= DATEADD(day, -30, D.Date)
    ORDER BY D1.Date
) D30
OUTER APPLY (
    SELECT TOP 1 D2.Value
    FROM Data D2
    WHERE D2.Date &lt; DATEADD(day, -30, D.Date)
    AND D2.Date &gt;= DATEADD(day, -60, D.Date)
    ORDER BY D2.Date
) D60

An OUTER APPLY is like a left join to a subselect that can include references back to earlier parts of the main query. (A CROSS APPLY is similar, but behaves like an inner join.) The OUTER APPLY is used here to allow for no match conditions.

The above also uses the NULLIF() function to protect against potential divide-by-zero errors.

See this db<>fiddle for a working demo that also includes additional data having multiple values within the same 30-day window.

Note that some of the selected past values might not seem obviously correct until you recognize that some months have lengths other than 30-days. Changing the date range calculations to use month offsets instead of day offsets might make the results more intuitive.

Sample results:

Date	Value	%Change in last 30 days	%Change in last 60 days	D30_Value	D60_Value
2023-02-28	120.00	-6%	20%	127.20	100.00
2023-01-31	127.20	27%	null	100.00	null
2023-01-01	100.00	null	null	null	null
2022-04-05	110.00	null	null	null	null
2021-04-21	210.00	11%	40%	190.00	150.00
2021-04-11	200.00	11%	43%	180.00	140.00
2021-04-01	190.00	12%	46%	170.00	130.00
2021-03-21	180.00	20%	50%	150.00	120.00
2021-03-11	170.00	21%	55%	140.00	110.00
2021-03-01	160.00	23%	60%	130.00	100.00
2021-02-21	150.00	15%	50%	130.00	100.00
2021-02-11	140.00	17%	40%	120.00	100.00
2021-02-01	130.00	18%	30%	110.00	100.00
2021-01-21	120.00	20%	null	100.00	null
2021-01-11	110.00	10%	null	100.00	null
2021-01-01	100.00	null	null	null	null

I'll leave it to the OP to tweak the date range selection logic and the final result rounding and formatting.

答案2

得分: 0

这里有一个可能有用的模式。收集最近的数值，然后确定您想要保留哪些。如果您在可能的回溯行数上有一个上限，那么您可以使用这种方法，并调整为您喜欢的选项。

在这个示例中，我基本上假设可能会有间隙，但每30天只会有一个单一的值。虽然这可能不适用于您的数据，但应该很容易合并额外的条件：

with data as (
    select *,
        lag("date", 1) over (order by "date") as d1,
        lag("date", 2) over (order by "date") as d2,
        lag(value, 1)  over (order by "date") as v1,
        lag(value, 2)  over (order by "date") as v2
    from T
)
select *,
    (case when datediff(day, d1, "date") between  1 and 30 then v1 end - value) / value as "30daychange",
    (case when datediff(day, d1, "date") between 31 and 60 then v1
          when datediff(day, d2, "date") between 31 and 60 then v2 end - value) / value as "60daychange"
from data;

不需要联接操作，只涉及对日期列进行排序，因此这将非常高效。

英文:

Here's a pattern that might be useful. Collect the nearest values and then determine which you want to keep. If you have a cap in the number of possible lookback rows then you can use this method and adjust for whichever option you like.

In this example I'm basically assuming that there might be gaps but that there will only be a single value per 30 days. While that might not apply to your data it should be easy to incorporate the extra conditions:

with data as (
    select *,
        lag(&quot;date&quot;, 1) over (order by &quot;date&quot;) as d1,
        lag(&quot;date&quot;, 2) over (order by &quot;date&quot;) as d2,
        lag(value, 1)  over (order by &quot;date&quot;) as v1,
        lag(value, 2)  over (order by &quot;date&quot;) as v2
    from T
)
select *,
    (case when datediff(day, d1, &quot;date&quot;) between  1 and 30 then v1 end - value) / value as &quot;30daychange&quot;,
    (case when datediff(day, d1, &quot;date&quot;) between 31 and 60 then v1
          when datediff(day, d2, &quot;date&quot;) between 31 and 60 then v2 end - value) / value as &quot;60daychange&quot;
from data;

No joins are required and it only involves a sort on the date column so this will be very efficient.

答案3

得分: 0

假设源数据中每个日期最多只有一行
为任何缺失的日期生成行
使用 FIRST_VALUE 与 IGNORE NULLS 来查找目标范围中第一个具有数据的日期的值

以下使用了一些 SQL Server 2022 特定的函数，所以如果你不是在那个版本上，你需要找到替代方案。

WITH DateLimits AS
(
SELECT MIN(Date) AS MinDate, DATEDIFF(DAY, MIN(Date), MAX(Date)) AS RangeSize
FROM YourTable
), Expanded AS
(
SELECT  D.Date,
        Y.Value, 
        Within30 = FIRST_VALUE(Y.Value) IGNORE NULLS OVER (ORDER BY D.Date ROWS BETWEEN 30 PRECEDING AND 1 PRECEDING) , 
        Within60 = FIRST_VALUE(Y.Value) IGNORE NULLS OVER (ORDER BY D.Date ROWS BETWEEN 60 PRECEDING AND 31 PRECEDING) 
FROM DateLimits
CROSS APPLY GENERATE_SERIES(0, RangeSize) G
CROSS APPLY (SELECT DATEADD(DAY, G.value, MinDate)) D(Date) 
LEFT JOIN YourTable Y ON Y.Date = D.Date
)
SELECT Date, 
       Value,  
      (Value - Within30)/Within30 AS [过去30天的变化],  
      (Value - Within60)/Within60 AS [过去60天的变化]
FROM Expanded
WHERE Value IS NOT NULL
ORDER BY Date DESC

Note: I've translated the SQL code as requested, without any additional content.

英文:

Potentially you could also do something like the below

assumes at most one row per date in the source data
generates rows for any missing dates
Use FIRST_VALUE with IGNORE NULLS to find the value from the first date in the target range with data

Below uses some SQL Server 2022 specific functions so if you aren't on that version you would need to find alternatives.

WITH DateLimits AS
(
SELECT MIN(Date) AS MinDate, DATEDIFF(DAY, MIN(Date), MAX(Date)) AS RangeSize
FROM YourTable
), Expanded AS
(
SELECT  D.Date,
        Y.Value, 
        Within30 = FIRST_VALUE(Y.Value) IGNORE NULLS OVER (ORDER BY D.Date ROWS BETWEEN 30 PRECEDING AND 1 PRECEDING) , 
        Within60 = FIRST_VALUE(Y.Value) IGNORE NULLS OVER (ORDER BY D.Date ROWS BETWEEN 60 PRECEDING AND 31 PRECEDING) 
FROM DateLimits
CROSS APPLY GENERATE_SERIES(0, RangeSize) G
CROSS APPLY (SELECT DATEADD(DAY, G.value, MinDate)) D(Date) 
LEFT JOIN YourTable Y ON Y.Date = D.Date
)
SELECT Date, 
       Value,  
      (Value - Within30)/Within30 AS [Change in last 30 days],  
      (Value - Within60)/Within60 AS [Change in last 60 days]
FROM Expanded
WHERE Value IS NOT NULL
ORDER BY Date DESC

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

SQL窗口函数，按日期比较值

问题

答案1

答案2

答案3

你可以使用多个驱动程序编译Go的database/sql程序。

如何以良好的方式处理 SQL 中的 NULL 值和 JSON？

自然连接的使用

有没有办法在 SQL 中检查某个间隔

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论