英文:
How to filter only get rows that have the minimum/earliest date per group in a column with SQL
问题
以下是您要翻译的部分:
如果我有以下表格:
ID   Gender    Event_Date      
1     M        2015-03-01        
1     M        2012-04-15        
2     F        2005-11-14        
2     F        2005-11-14
2     F        2005-11-14
2     F        2013-05-19
2     F        2013-05-19
3     F        2010-07-02
3     F        2004-09-09
3     F        2004-02-17
3     F        2004-02-17
4     M        2019-01-29
5     M        2006-04-04
5     M        2006-08-07
而我想要保留每个ID组中具有最早日期的所有行:
ID   Gender    Event_Date              
1     M        2012-04-15        
2     F        2005-11-14        
2     F        2005-11-14
2     F        2005-11-14
3     F        2004-02-17
3     F        2004-02-17
4     M        2019-01-29
5     M        2006-04-04
如果我执行以下操作:
SELECT * FROM table
GROUP BY ID
ORDER BY Event_Date ASC 
LIMIT 1;
LIMIT如何处理平局?它会保留具有最早日期的所有行还是随机选择一个?(我想保留所有行)。我还尝试了以下版本:
SELECT * FROM table
WHERE Event_Date IS Min(Event_Date)
英文:
I am trying to filter a table in SQL so that only the rows that have the earliest date are kept per ID.
If I had the following table:
ID   Gender    Event_Date      
1     M        2015-03-01        
1     M        2012-04-15        
2     F        2005-11-14        
2     F        2005-11-14
2     F        2005-11-14
2     F        2013-05-19
2     F        2013-05-19
3     F        2010-07-02
3     F        2004-09-09
3     F        2004-02-17
3     F        2004-02-17
4     M        2019-01-29
5     M        2006-04-04
5     M        2006-08-07
And I would like to keep all of the rows that have the earliest date, when grouped by ID:
ID   Gender    Event_Date              
1     M        2012-04-15        
2     F        2005-11-14        
2     F        2005-11-14
2     F        2005-11-14
3     F        2004-02-17
3     F        2004-02-17
4     M        2019-01-29
5     M        2006-04-04
If I did the following:
SELECT * FROM table
GROUP BY ID
ORDER BY Event_Date ASC 
LIMIT 1;
How does LIMIT handle the ties? Would it keep all of the rows with earliest date or just pick one at random? (I want to keep all of them). I also tried versions of this:
SELECT * FROM table
WHERE Event_Date IS Min(Event_Date)
答案1
得分: 1
以下是已翻译的内容:
一种标准方法是使用 RANK() 窗口函数:
WITH cte AS (
    SELECT t.*, RANK() OVER (PARTITION BY ID ORDER BY Event_Date) rnk
    FROM yourTable t
)
SELECT ID, Gender, Event_Date
FROM cte
WHERE rnk = 1
ORDER BY ID;
如果您的数据库不支持 RANK(),那么我们可以使用存在性逻辑:
SELECT ID, Gender, Event_Date
FROM yourTable t1
WHERE NOT EXISTS (
    SELECT 1
    FROM yourTable t2
    WHERE t2.ID = t1.ID AND
          t2.Event_Date < t1.Event_Date
);
英文:
One canonical approach here would be to use the RANK() window function:
<!-- language: sql -->
WITH cte AS (
    SELECT t.*, RANK() OVER (PARTITION BY ID ORDER BY Event_Date) rnk
    FROM yourTable t
)
SELECT ID, Gender, Event_Date
FROM cte
WHERE rnk = 1
ORDER BY ID;
If your database does not have RANK(), then we can use exists logic:
<!-- language: sql -->
SELECT ID, Gender, Event_Date
FROM yourTable t1
WHERE NOT EXISTS (
    SELECT 1
    FROM yourTable t2
    WHERE t2.ID = t1.ID AND
          t2.Event_Date < t1.Event_Date
);
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论