如何在使用GROUP BY时创建一个“其他”组?

huangapple go评论125阅读模式
英文:

How to create an "other" group when using GROUP BY?

问题

摘要:我有一些数据,我对其进行分组、计数,并按计数排序。我想保留前X行,并将其他行累积为一行,命名为"其他"。

以下是在DB Fiddle上提供的数据

  1. CREATE TABLE cities (name TEXT);
  2. INSERT INTO cities(name) VALUES ('paris'), ('paris'), ('london'), ('london'), ('london'), ('london'), ('rome'), ('madrid');
  3. SELECT name, COUNT(name) AS count FROM cities
  4. GROUP BY name
  5. ORDER BY count DESC
  6. LIMIT 2

上面的代码生成了出现次数最多的两个城市(具有最多条目的城市):

  1. | name | count |
  2. | ------ | ----- |
  3. | london | 4 |
  4. | paris | 2 |

我想要实现的是:

  1. | name | count |
  2. | ------ | ----- |
  3. | london | 4 |
  4. | paris | 2 |
  5. | other | 2 |

其中,所有其余的行(在我的例子中为'rome'和'madrid')都在'other'下合并在一起。

我不太了解SQL,但在伪代码中,我考虑了类似以下的内容:

  1. SELECT name, COUNT(name) AS count FROM cities
  2. GROUP BY name
  3. ORDER BY count DESC
  4. LIMIT 2 AS top_2
  5. LIMIT 2+ AS other

是否可以在单次查询中实现这个目标?作为一种权宜之计,我将执行相同的查询,但以相反的方式排序,并限制为"行数 - 前2行"(所以总共有3个查询,我想)。

(我很惊讶这个问题以前没有被提出过,但我找不到类似的问题。)

如您所请求,这是翻译好的部分。如果您需要进一步的帮助,请随时提问。

英文:

Summary: I have data that I group, count, and sort by count. I would like to keep only the first X lines, and have others accumulated as a line called "other"

The data below is available at DB Fiddle

  1. CREATE TABLE cities (name TEXT);
  2. INSERT INTO cities(name) VALUES ('paris'), ('paris'), ( 'london'), ( 'london'), ( 'london'), ( 'london'), ( 'rome'), ( 'madrid');
  3. SELECT name, COUNT(name) AS count FROM cities
  4. GROUP BY name
  5. ORDER BY count DESC
  6. LIMIT 2

The code above generates the top two most present cities (the ones that have the largest number of entries):

  1. | name | count |
  2. | ------ | ----- |
  3. | london | 4 |
  4. | paris | 2 |

What I would like to achieve is

  1. | name | count |
  2. | ------ | ----- |
  3. | london | 4 |
  4. | paris | 2 |
  5. | other | 2 |

where all remaining rows ('rome' and 'madrid' in my case) are grouped together under other.

I do not know SQL very much but in pseudo-code I am thinking about something like

  1. SELECT name, COUNT(name) AS count FROM cities
  2. GROUP BY name
  3. ORDER BY count DESC
  4. LIMIT 2 AS top_2
  5. LIMIT 2+ AS other

Is this possible in a single pass? As a workaround I will have the same query, but sorted the other way round and limited to "number of rows - the top 2" (so 3 queries in total I think).

(I am surprised that this has not been asked before, but could not find a match)

答案1

得分: 6

以下是您要翻译的部分:

一种选择是在row_number函数上使用一个case表达式来使用另一级别的汇总:

  1. WITH T AS
  2. (
  3. SELECT
  4. name,
  5. COUNT(name) AS count,
  6. ROW_NUMBER() OVER (ORDER BY COUNT(name) DESC) AS rn
  7. FROM cities
  8. GROUP BY name
  9. ORDER BY count DESC
  10. )
  11. SELECT
  12. CASE
  13. WHEN rn IN (1, 2)
  14. THEN name
  15. ELSE 'Others'
  16. END AS name,
  17. SUM(count) AS count
  18. FROM T
  19. GROUP BY 1
  20. ORDER BY count DESC

为了确保'Others'出现在结果集的最后,您可以使用以下方式进行排序:ORDER BY CASE WHEN rn IN (1, 2) THEN 1 ELSE 2 END, count DESC

英文:

One option is to use another level of aggregation with a case expression on a row_number function:

  1. WITH T AS
  2. (
  3. SELECT
  4. name,
  5. COUNT(name) AS count,
  6. ROW_NUMBER() OVER (ORDER BY COUNT(name) DESC) AS rn
  7. FROM cities
  8. GROUP BY name
  9. ORDER BY count DESC
  10. )
  11. SELECT
  12. CASE
  13. WHEN rn IN (1, 2)
  14. THEN name
  15. ELSE 'Others'
  16. END AS name,
  17. SUM(count) AS count
  18. FROM T
  19. GROUP BY 1
  20. ORDER BY count DESC

To ensure that 'Other' appear last in the result set, you may ORDER BY CASE WHEN rn IN (1, 2) THEN 1 ELSE 2 END, count DESC

Demo

答案2

得分: 0

以下是已经翻译好的部分:

使用通用表达式(Common Table Expression):

  1. WITH
  2. CTE_All (ct_all) AS
  3. (SELECT COUNT(name)
  4. FROM cities),
  5. CTE_Top (ct_name, ct_count) AS
  6. (SELECT TOP 2 name, COUNT(name)
  7. FROM cities
  8. GROUP BY name
  9. HAVING COUNT(name) > 1
  10. ORDER BY COUNT(name) DESC)
  11. SELECT * FROM CTE_Top
  12. UNION ALL
  13. SELECT 'other', (SELECT ct_all FROM CTE_All) - SUM(ct_count)
  14. FROM CTE_Top;
英文:

With common table expression:

  1. WITH
  2. CTE_All (ct_all) AS
  3. (SELECT COUNT(name)
  4. FROM cities),
  5. CTE_Top (ct_name, ct_count) AS
  6. (SELECT TOP 2 name, COUNT(name)
  7. FROM cities
  8. GROUP BY name
  9. HAVING COUNT(name) > 1
  10. ORDER BY COUNT(name) DESC)
  11. SELECT * FROM CTE_Top
  12. UNION ALL
  13. SELECT 'other', (SELECT ct_all FROM CTE_All) - SUM(ct_count)
  14. FROM CTE_Top;

huangapple
  • 本文由 发表于 2023年5月29日 19:35:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/76356990.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定