如何在 SQL Server 中显示日期范围,忽略被更新条目覆盖的较早条目

huangapple go评论56阅读模式
英文:

How to display date ranges ignoring older entries covered by newer entries in SQL Server

问题

我们需要筛选掉与较新行相交的一些日期范围,我们该如何做?

这是我目前的SQL,我们需要的是只返回最后3行,因为前2行被后来添加的条目所覆盖。

DROP TABLE IF EXISTS #ranges

CREATE TABLE #ranges (
    StartDate DATETIME2(7), 
    EndDate DATETIME2(7),
    CreatedDate DATETIME2(7),
    Cost DECIMAL
);

INSERT INTO #ranges(StartDate, EndDate, CreatedDate, Cost)
VALUES
('2024-01-02 00:00:00', '2024-01-05 00:00:00', '2023-07-10 19:40:19.60', 10.00),
('2024-01-02 00:00:00', '2024-01-03 00:00:00', '2023-07-10 19:41:45.73', 5.00),
('2024-01-02 00:00:00', '2024-01-05 00:00:00', '2023-07-10 19:45:32.76', 22.00),
('2024-01-03 00:00:00', '2024-01-05 00:00:00', '2023-07-10 19:54:41.60', 44.00),
('2024-01-04 00:00:00', '2024-01-04 00:00:00', '2023-07-10 19:54:52.60', 50.00);

我们希望得到如下的数据,原因是最后3行覆盖了这个日期范围,在这种情况下为1月1日至5日。可能会有重叠,但它们覆盖了该范围。或者我们应该做些其他事情,例如只返回每天的成本?

StartDate EndDate CreatedDate Cost
2024-01-02 00:00:00 2024-01-05 00:00:00 2023-07-10 19:45:32.76 22.00
2024-01-03 00:00:00 2024-01-05 00:00:00 2023-07-10 19:54:41.60 44.00
2024-01-04 00:00:00 2024-01-04 00:00:00 2023-07-10 19:54:52.60 50.00

在表格中,它会显示如下:

2024/01/02 2024/01/03 2024/01/04 2024/01/05
22.00 44.00 50.00 44.00

非常感谢。

英文:

We need to filter out some date ranges that intersect with newer rows how might we do this?

This is the SQL I have at the moment, what we need if for it to just return then last 3 rows as the first 2 rows are covered by later entries that were added.

DROP TABLE IF EXISTS #ranges

CREATE TABLE #ranges (
	StartDate DATETIME2(7), 
	EndDate DATETIME2(7),
	CreatedDate DATETIME2(7),
	Cost DECIMAL
);

INSERT INTO #ranges(StartDate, EndDate, CreatedDate, Cost)
VALUES
('2024-01-02 00:00:00', '2024-01-05 00:00:00', '2023-07-10 19:40:19.60', 10.00),
('2024-01-02 00:00:00', '2024-01-03 00:00:00', '2023-07-10 19:41:45.73', 5.00),
('2024-01-02 00:00:00', '2024-01-05 00:00:00', '2023-07-10 19:45:32.76', 22.00),
('2024-01-03 00:00:00', '2024-01-05 00:00:00', '2023-07-10 19:54:41.60', 44.00),
('2024-01-04 00:00:00', '2024-01-04 00:00:00', '2023-07-10 19:54:52.60', 50.00);

We would like to get the data like this, the reason being that the last 3 rows cover that date period in this case the 1st to the 5th of Jan. There may be an overlap but there cover that range. Or should we do something else e.g. just return a row for each day with the cost for that day?

StartDate EndDate CreatedDate Cost
2024-01-02 00:00:00 2024-01-05 00:00:00 2023-07-10 19:45:32.76 22.00
2024-01-03 00:00:00 2024-01-05 00:00:00 2023-07-10 19:54:41.60 44.00
2024-01-04 00:00:00 2024-01-04 00:00:00 2023-07-10 19:54:52.60 50.00

It will look like this in a table it is output in

2024/01/02 2024/01/03 2024/01/04 2024/01/05
22.00 44.00 50.00 44.00

Many thanks

答案1

得分: 2

你可以使用以下查询

删除 r1
从范围 r1
其中存在 (选择 1
    从 (
        选择 *
          PrevEndDate = LAG(EndDate) 在 (按 StartDate 排序)
        从范围 r2
        其中 r2.CreatedDate > r1.CreatedDate
          并且 r1.StartDate < r2.EndDate
          并且 r1.EndDate > r2.StartDate
    ) r2
COUNT(CASE WHEN PrevEndDate < StartDate THEN 1 END) = 0
       并且 MIN(r2.StartDate) <= r1.StartDate
       并且 MAX(r2.EndDate)   >= r1.EndDate
);

db<>fiddle

逻辑如下:

  • 获取所有范围中的行

  • 对于每个行,确保存在一组其他行,使得:

    • 所有行的CreatedDate在这一行之后。
    • 所有行与此行重叠(通过进行标准的开始/结束比较来进行重叠检查)
    • 对内部集合按StartDate排序,并为每个行获取上一个EndDate
    • 确保整个集合不包含间隙。
    • 最低StartDate小于或等于外部行。
    • 最高EndDate大于或等于外部行。
英文:

You can use the following query

DELETE r1
FROM ranges r1
WHERE EXISTS (SELECT 1
    FROM (
        SELECT *,
          PrevEndDate = LAG(EndDate) OVER (ORDER BY StartDate)
        FROM ranges r2
        WHERE r2.CreatedDate &gt; r1.CreatedDate
          AND r1.StartDate &lt; r2.EndDate
          AND r1.EndDate &gt; r2.StartDate
    ) r2
    HAVING COUNT(CASE WHEN PrevEndDate &lt; StartDate THEN 1 END) = 0
       AND MIN(r2.StartDate) &lt;= r1.StartDate
       AND MAX(r2.EndDate)   &gt;= r1.EndDate
);

db<>fiddle

The logic is as follows:

  • Take all rows in ranges

  • For each one, ensure that there exists a set of other rows such that:

    • All rows are CreatedDate after this one.
    • All rows overlap this one (by doing a standard start/end comparison to get an overlap check)
    • Sort the inner set by StartDate and get the previous EndDate for each.
    • Ensure that the set as a whole does not contain gaps.
    • And that the lowest StartDate is less than or equal to the outer row.
    • And that the highest EndDate is greater than or equal to the outer row.

答案2

得分: 0

以下是翻译好的部分:

可能有一种更优雅的解决方案,可以减少对数据的迭代次数,但是这是一个直接的方法,用于回答你的第一个问题。

SELECT * FROM
(
    SELECT *,
        RANK() OVER(PARTITION BY GroupId ORDER BY CreatedDate DESC) AS GroupRank
    FROM
    (
        SELECT *,
            SUM(StartDateBoundaries) OVER (ORDER BY CreatedDate ROWS UNBOUNDED PRECEDING) AS GroupId
        FROM
        (
            SELECT *,		
                CASE WHEN LAG(StartDate) OVER (ORDER BY StartDate) <> StartDate OR LAG(StartDate) OVER (ORDER BY StartDate) IS NULL THEN 1 ELSE 0 END AS StartDateBoundaries
            FROM #ranges
        ) AS X 
    ) AS Y
) AS Z
WHERE
    GroupRank = 1

希望这对你有所帮助。如果你需要进一步的翻译或有其他问题,请随时提出。

英文:

There may be a more elegant solution with fewer iterations over the data, however, here is a straightforward approach, for your first question.

SELECT * FROM
(
	SELECT *,
		RANK() OVER(PARTITION BY GroupId ORDER BY CreatedDate DESC) As GroupRank
	FROM
	(
		SELECT *,
			SUM(StartDateBoundries) OVER (ORDER BY CreatedDate ROWS UNBOUNDED PRECEDING) GroupId
		FROM
		(
			SELECT *,		
				CASE WHEN LAG(StartDate) OVER (ORDER BY StartDate) &lt;&gt; StartDate OR LAG(StartDate) OVER (ORDER BY StartDate) IS NULL THEN 1 ELSE 0 END AS StartDateBoundries
			FROM #ranges
		) AS X 
	)AS Y
)AS Z
WHERE
	GroupRank = 1

huangapple
  • 本文由 发表于 2023年7月10日 22:51:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/76654936.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定