如何在使用GROUP BY时创建一个“其他”组?

huangapple go评论78阅读模式
英文:

How to create an "other" group when using GROUP BY?

问题

摘要:我有一些数据,我对其进行分组、计数,并按计数排序。我想保留前X行,并将其他行累积为一行,命名为"其他"。

以下是在DB Fiddle上提供的数据

CREATE TABLE cities (name TEXT);
INSERT INTO cities(name) VALUES ('paris'), ('paris'), ('london'), ('london'), ('london'), ('london'), ('rome'), ('madrid');

SELECT name, COUNT(name) AS count FROM cities
GROUP BY name
ORDER BY count DESC
LIMIT 2

上面的代码生成了出现次数最多的两个城市(具有最多条目的城市):

| name   | count |
| ------ | ----- |
| london | 4     |
| paris  | 2     |

我想要实现的是:

| name   | count |
| ------ | ----- |
| london | 4     |
| paris  | 2     |
| other  | 2     |

其中,所有其余的行(在我的例子中为'rome'和'madrid')都在'other'下合并在一起。

我不太了解SQL,但在伪代码中,我考虑了类似以下的内容:

SELECT name, COUNT(name) AS count FROM cities
GROUP BY name
ORDER BY count DESC
LIMIT 2 AS top_2
LIMIT 2+ AS other

是否可以在单次查询中实现这个目标?作为一种权宜之计,我将执行相同的查询,但以相反的方式排序,并限制为"行数 - 前2行"(所以总共有3个查询,我想)。

(我很惊讶这个问题以前没有被提出过,但我找不到类似的问题。)

如您所请求,这是翻译好的部分。如果您需要进一步的帮助,请随时提问。

英文:

Summary: I have data that I group, count, and sort by count. I would like to keep only the first X lines, and have others accumulated as a line called "other"

The data below is available at DB Fiddle

CREATE TABLE cities (name TEXT);
INSERT INTO cities(name) VALUES ('paris'), ('paris'), ( 'london'), ( 'london'), ( 'london'), ( 'london'), ( 'rome'), ( 'madrid');

SELECT name, COUNT(name) AS count FROM cities
GROUP BY name
ORDER BY count DESC
LIMIT 2

The code above generates the top two most present cities (the ones that have the largest number of entries):

| name   | count |
| ------ | ----- |
| london | 4     |
| paris  | 2     |

What I would like to achieve is

| name   | count |
| ------ | ----- |
| london | 4     |
| paris  | 2     |
| other  | 2     |

where all remaining rows ('rome' and 'madrid' in my case) are grouped together under other.

I do not know SQL very much but in pseudo-code I am thinking about something like

SELECT name, COUNT(name) AS count FROM cities
GROUP BY name
ORDER BY count DESC
LIMIT 2 AS top_2
LIMIT 2+ AS other

Is this possible in a single pass? As a workaround I will have the same query, but sorted the other way round and limited to "number of rows - the top 2" (so 3 queries in total I think).

(I am surprised that this has not been asked before, but could not find a match)

答案1

得分: 6

以下是您要翻译的部分:

一种选择是在row_number函数上使用一个case表达式来使用另一级别的汇总:

WITH T AS
(
  SELECT
     name, 
     COUNT(name) AS count,
     ROW_NUMBER() OVER (ORDER BY COUNT(name) DESC) AS rn 
  FROM cities
  GROUP BY name
  ORDER BY count DESC
)
SELECT 
   CASE 
     WHEN rn IN (1, 2) 
     THEN name 
     ELSE 'Others' 
   END AS name,
   SUM(count) AS count
FROM T
GROUP BY 1
ORDER BY count DESC

为了确保'Others'出现在结果集的最后,您可以使用以下方式进行排序:ORDER BY CASE WHEN rn IN (1, 2) THEN 1 ELSE 2 END, count DESC

英文:

One option is to use another level of aggregation with a case expression on a row_number function:

WITH T AS
(
  SELECT
     name, 
     COUNT(name) AS count,
     ROW_NUMBER() OVER (ORDER BY COUNT(name) DESC) AS rn 
  FROM cities
  GROUP BY name
  ORDER BY count DESC
)
SELECT 
   CASE 
     WHEN rn IN (1, 2) 
     THEN name 
     ELSE 'Others' 
   END AS name,
   SUM(count) AS count
FROM T
GROUP BY 1
ORDER BY count DESC

To ensure that 'Other' appear last in the result set, you may ORDER BY CASE WHEN rn IN (1, 2) THEN 1 ELSE 2 END, count DESC

Demo

答案2

得分: 0

以下是已经翻译好的部分:

使用通用表达式(Common Table Expression):

WITH
	CTE_All (ct_all) AS
		(SELECT COUNT(name)
		FROM cities),
	CTE_Top (ct_name, ct_count) AS
		(SELECT TOP 2 name, COUNT(name)
		FROM cities
		GROUP BY name
		HAVING COUNT(name) > 1
		ORDER BY COUNT(name) DESC)

SELECT * FROM CTE_Top
UNION ALL
SELECT 'other', (SELECT ct_all FROM CTE_All) - SUM(ct_count)
	FROM CTE_Top;
英文:

With common table expression:

WITH
	CTE_All (ct_all) AS
		(SELECT COUNT(name)
		FROM cities),
	CTE_Top (ct_name, ct_count) AS
		(SELECT TOP 2 name, COUNT(name)
		FROM cities
		GROUP BY name
		HAVING COUNT(name) > 1
		ORDER BY COUNT(name) DESC)

SELECT * FROM CTE_Top
UNION ALL
SELECT 'other', (SELECT ct_all FROM CTE_All) - SUM(ct_count)
	FROM CTE_Top;

huangapple
  • 本文由 发表于 2023年5月29日 19:35:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/76356990.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定