英文:
How to display date ranges ignoring older entries covered by newer entries in SQL Server
问题
我们需要筛选掉与较新行相交的一些日期范围,我们该如何做?
这是我目前的SQL,我们需要的是只返回最后3行,因为前2行被后来添加的条目所覆盖。
DROP TABLE IF EXISTS #ranges
CREATE TABLE #ranges (
StartDate DATETIME2(7),
EndDate DATETIME2(7),
CreatedDate DATETIME2(7),
Cost DECIMAL
);
INSERT INTO #ranges(StartDate, EndDate, CreatedDate, Cost)
VALUES
('2024-01-02 00:00:00', '2024-01-05 00:00:00', '2023-07-10 19:40:19.60', 10.00),
('2024-01-02 00:00:00', '2024-01-03 00:00:00', '2023-07-10 19:41:45.73', 5.00),
('2024-01-02 00:00:00', '2024-01-05 00:00:00', '2023-07-10 19:45:32.76', 22.00),
('2024-01-03 00:00:00', '2024-01-05 00:00:00', '2023-07-10 19:54:41.60', 44.00),
('2024-01-04 00:00:00', '2024-01-04 00:00:00', '2023-07-10 19:54:52.60', 50.00);
我们希望得到如下的数据,原因是最后3行覆盖了这个日期范围,在这种情况下为1月1日至5日。可能会有重叠,但它们覆盖了该范围。或者我们应该做些其他事情,例如只返回每天的成本?
StartDate | EndDate | CreatedDate | Cost |
---|---|---|---|
2024-01-02 00:00:00 | 2024-01-05 00:00:00 | 2023-07-10 19:45:32.76 | 22.00 |
2024-01-03 00:00:00 | 2024-01-05 00:00:00 | 2023-07-10 19:54:41.60 | 44.00 |
2024-01-04 00:00:00 | 2024-01-04 00:00:00 | 2023-07-10 19:54:52.60 | 50.00 |
在表格中,它会显示如下:
2024/01/02 | 2024/01/03 | 2024/01/04 | 2024/01/05 |
---|---|---|---|
22.00 | 44.00 | 50.00 | 44.00 |
非常感谢。
英文:
We need to filter out some date ranges that intersect with newer rows how might we do this?
This is the SQL I have at the moment, what we need if for it to just return then last 3 rows as the first 2 rows are covered by later entries that were added.
DROP TABLE IF EXISTS #ranges
CREATE TABLE #ranges (
StartDate DATETIME2(7),
EndDate DATETIME2(7),
CreatedDate DATETIME2(7),
Cost DECIMAL
);
INSERT INTO #ranges(StartDate, EndDate, CreatedDate, Cost)
VALUES
('2024-01-02 00:00:00', '2024-01-05 00:00:00', '2023-07-10 19:40:19.60', 10.00),
('2024-01-02 00:00:00', '2024-01-03 00:00:00', '2023-07-10 19:41:45.73', 5.00),
('2024-01-02 00:00:00', '2024-01-05 00:00:00', '2023-07-10 19:45:32.76', 22.00),
('2024-01-03 00:00:00', '2024-01-05 00:00:00', '2023-07-10 19:54:41.60', 44.00),
('2024-01-04 00:00:00', '2024-01-04 00:00:00', '2023-07-10 19:54:52.60', 50.00);
We would like to get the data like this, the reason being that the last 3 rows cover that date period in this case the 1st to the 5th of Jan. There may be an overlap but there cover that range. Or should we do something else e.g. just return a row for each day with the cost for that day?
StartDate | EndDate | CreatedDate | Cost |
---|---|---|---|
2024-01-02 00:00:00 | 2024-01-05 00:00:00 | 2023-07-10 19:45:32.76 | 22.00 |
2024-01-03 00:00:00 | 2024-01-05 00:00:00 | 2023-07-10 19:54:41.60 | 44.00 |
2024-01-04 00:00:00 | 2024-01-04 00:00:00 | 2023-07-10 19:54:52.60 | 50.00 |
It will look like this in a table it is output in
2024/01/02 | 2024/01/03 | 2024/01/04 | 2024/01/05 |
---|---|---|---|
22.00 | 44.00 | 50.00 | 44.00 |
Many thanks
答案1
得分: 2
你可以使用以下查询
删除 r1
从范围 r1
其中存在 (选择 1
从 (
选择 *,
PrevEndDate = LAG(EndDate) 在 (按 StartDate 排序)
从范围 r2
其中 r2.CreatedDate > r1.CreatedDate
并且 r1.StartDate < r2.EndDate
并且 r1.EndDate > r2.StartDate
) r2
有 COUNT(CASE WHEN PrevEndDate < StartDate THEN 1 END) = 0
并且 MIN(r2.StartDate) <= r1.StartDate
并且 MAX(r2.EndDate) >= r1.EndDate
);
逻辑如下:
-
获取所有
范围
中的行 -
对于每个行,确保存在一组其他行,使得:
- 所有行的
CreatedDate
在这一行之后。 - 所有行与此行重叠(通过进行标准的开始/结束比较来进行重叠检查)
- 对内部集合按
StartDate
排序,并为每个行获取上一个EndDate
。 - 确保整个集合不包含间隙。
- 最低
StartDate
小于或等于外部行。 - 最高
EndDate
大于或等于外部行。
- 所有行的
英文:
You can use the following query
DELETE r1
FROM ranges r1
WHERE EXISTS (SELECT 1
FROM (
SELECT *,
PrevEndDate = LAG(EndDate) OVER (ORDER BY StartDate)
FROM ranges r2
WHERE r2.CreatedDate > r1.CreatedDate
AND r1.StartDate < r2.EndDate
AND r1.EndDate > r2.StartDate
) r2
HAVING COUNT(CASE WHEN PrevEndDate < StartDate THEN 1 END) = 0
AND MIN(r2.StartDate) <= r1.StartDate
AND MAX(r2.EndDate) >= r1.EndDate
);
The logic is as follows:
-
Take all rows in
ranges
-
For each one, ensure that there exists a set of other rows such that:
- All rows are
CreatedDate
after this one. - All rows overlap this one (by doing a standard start/end comparison to get an overlap check)
- Sort the inner set by
StartDate
and get the previousEndDate
for each. - Ensure that the set as a whole does not contain gaps.
- And that the lowest
StartDate
is less than or equal to the outer row. - And that the highest
EndDate
is greater than or equal to the outer row.
- All rows are
答案2
得分: 0
以下是翻译好的部分:
可能有一种更优雅的解决方案,可以减少对数据的迭代次数,但是这是一个直接的方法,用于回答你的第一个问题。
SELECT * FROM
(
SELECT *,
RANK() OVER(PARTITION BY GroupId ORDER BY CreatedDate DESC) AS GroupRank
FROM
(
SELECT *,
SUM(StartDateBoundaries) OVER (ORDER BY CreatedDate ROWS UNBOUNDED PRECEDING) AS GroupId
FROM
(
SELECT *,
CASE WHEN LAG(StartDate) OVER (ORDER BY StartDate) <> StartDate OR LAG(StartDate) OVER (ORDER BY StartDate) IS NULL THEN 1 ELSE 0 END AS StartDateBoundaries
FROM #ranges
) AS X
) AS Y
) AS Z
WHERE
GroupRank = 1
希望这对你有所帮助。如果你需要进一步的翻译或有其他问题,请随时提出。
英文:
There may be a more elegant solution with fewer iterations over the data, however, here is a straightforward approach, for your first question.
SELECT * FROM
(
SELECT *,
RANK() OVER(PARTITION BY GroupId ORDER BY CreatedDate DESC) As GroupRank
FROM
(
SELECT *,
SUM(StartDateBoundries) OVER (ORDER BY CreatedDate ROWS UNBOUNDED PRECEDING) GroupId
FROM
(
SELECT *,
CASE WHEN LAG(StartDate) OVER (ORDER BY StartDate) <> StartDate OR LAG(StartDate) OVER (ORDER BY StartDate) IS NULL THEN 1 ELSE 0 END AS StartDateBoundries
FROM #ranges
) AS X
)AS Y
)AS Z
WHERE
GroupRank = 1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论