问题

我有一个雪花表，其中包括学生及其入学期间、开始日期和结束日期，如下所示：

学生ID	开始日期	结束日期
mid	2023-01-01	2023-04-30
mid	2022-12-01	2022-12-31
mid	2022-11-01	2022-11-30
mid	2022-02-01	2022-10-31
mid	2022-10-01	2022-10-31

我需要将其规范化为一行，其中显示学生的入学日期从2022-02-01开始，结束于2023-04-30。要将其视为一行，必须有一个连续的入学期间。中间的任何间断都将创建两行，以此类推。

我尝试使用Lag和Lead函数，但无法弄清楚。任何帮助将不胜感激。

英文:

I have a snowflake table which consists of students and their enrollment period, startdate and enddate as below:

studentid	startdate	enddate
mid	2023-01-01	2023-04-30
mid	2022-12-01	2022-12-31
mid	2022-11-01	2022-11-30
mid	2022-02-01	2022-10-31
mid	2022-10-01	2022-10-31

I need to normalize this into one single row, where it says student enrollment starts on 2022-02-01 and ends on 2023-04-30, There should be one continuous enrollment to consider it as one row. One Break in between will create two rows likewise more breaks more rows.

I tried with Lag Lead Functions but couldn't figure it out. Any help would be appreciated.

答案1

得分: 0

我不确定它是否涵盖了所有用例，但请尝试以下操作：

创建一个测试表并填充一些测试数据：

CREATE OR REPLACE TABLE enrollment_intervals (studentid int, startdate date, enddate date);

INSERT INTO enrollment_intervals  
VALUES
    (1, '2023-01-01', '2023-04-30'),
    (1, '2022-12-01', '2022-12-31'),
    (1, '2022-11-01', '2022-11-30'),
    (1, '2022-02-01', '2022-10-31'),
    (1, '2022-10-01', '2022-10-31'),
    (2, '2023-01-01', '2023-02-01'),
    (3, '2024-01-01', '2024-05-01');

实际查询：

WITH ordered_intervals AS (
    SELECT studentid,
           startdate,
           enddate,
           LEAD(startdate) OVER (PARTITION BY studentid ORDER BY startdate) next_startdate
    FROM enrollment_intervals
    ORDER BY studentid, startdate
),
merged_intervals AS (
    SELECT studentid,
           startdate,
           COALESCE(
               (CASE WHEN enddate < DATEADD(day, -1, next_startdate) THEN enddate ELSE NULL END),
               MAX(enddate) OVER (PARTITION BY studentid ORDER BY startdate ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
           ) enddate
    FROM ordered_intervals
),
merged_intervals_with_lag AS (
    SELECT *,
           LAG(enddate) OVER (PARTITION BY studentid ORDER BY startdate) as prev_enddate
    FROM merged_intervals
),
grouped_intervals AS (
    SELECT *,
           SUM(CASE WHEN startdate > DATEADD(day, 1, prev_enddate) THEN 1 ELSE 0 END)
           OVER (PARTITION BY studentid ORDER BY startdate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as group_id
    FROM merged_intervals_with_lag
)
SELECT studentid,
       MIN(startdate) as continuous_start,
       MAX(enddate) as continuous_end
FROM grouped_intervals
GROUP BY studentid, group_id
ORDER BY studentid, continuous_start;

这里的想法是使用公共表达式（CTE）来合并时间段，检查是否没有间隙，并确保不会遗漏重叠和包含。希望能帮助您。

英文:

I am not sure if it covers all use-cases but try this:

Create a test table and fill it with some test data:

CREATE OR REPLACE TABLE enrollment_intervals (studentid int, startdate date, enddate date);

INSERT INTO enrollment_intervals  
VALUES
    (1, &#39;2023-01-01&#39;, &#39;2023-04-30&#39;),
    (1, &#39;2022-12-01&#39;, &#39;2022-12-31&#39;),
    (1, &#39;2022-11-01&#39;, &#39;2022-11-30&#39;),
    (1, &#39;2022-02-01&#39;, &#39;2022-10-31&#39;),
    (1, &#39;2022-10-01&#39;, &#39;2022-10-31&#39;),
    (2, &#39;2023-01-01&#39;, &#39;2023-02-01&#39;),
    (3, &#39;2024-01-01&#39;, &#39;2024-05-01&#39;);

Actual query:

WITH ordered_intervals AS (
    SELECT studentid,
           startdate,
           enddate,
           LEAD(startdate) OVER (PARTITION BY studentid ORDER BY startdate) next_startdate
    FROM enrollment_intervals
    ORDER BY studentid, startdate
),
merged_intervals AS (
    SELECT studentid,
           startdate,
           COALESCE(
               (CASE WHEN enddate &lt; DATEADD(day, -1, next_startdate) THEN enddate ELSE NULL END),
               MAX(enddate) OVER (PARTITION BY studentid ORDER BY startdate ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
           ) enddate
    FROM ordered_intervals
),
merged_intervals_with_lag AS (
    SELECT *,
           LAG(enddate) OVER (PARTITION BY studentid ORDER BY startdate) as prev_enddate
    FROM merged_intervals
),
grouped_intervals AS (
    SELECT *,
           SUM(CASE WHEN startdate &gt; DATEADD(day, 1, prev_enddate) THEN 1 ELSE 0 END)
           OVER (PARTITION BY studentid ORDER BY startdate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as group_id
    FROM merged_intervals_with_lag
)
SELECT studentid,
       MIN(startdate) as continuous_start,
       MAX(enddate) as continuous_end
FROM grouped_intervals
GROUP BY studentid, group_id
ORDER BY studentid, continuous_start;

The idea here that you use CTE to merge intervals checking if there are no gaps and ensure that you are not missing overlaps and inclusions. I hope it helps.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将多个记录归一化为一个记录，查看报名期。

问题

答案1

检索具有相同位置的最小和最大存储编号。

How to flag all records for a date when at least two records for that date have same duration OR at least one record has 0 durtation

春季启动查询DTO

需要通过在Oracle DB中查询来解析xmltype值

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论