2023年5月29日 19:35:32go评论125阅读模式

英文:

How to create an "other" group when using GROUP BY?

问题

摘要：我有一些数据，我对其进行分组、计数，并按计数排序。我想保留前X行，并将其他行累积为一行，命名为"其他"。

CREATE TABLE cities (name TEXT);
INSERT INTO cities(name) VALUES ('paris'), ('paris'), ('london'), ('london'), ('london'), ('london'), ('rome'), ('madrid');
SELECT name, COUNT(name) AS count FROM cities
GROUP BY name
ORDER BY count DESC
LIMIT 2

上面的代码生成了出现次数最多的两个城市（具有最多条目的城市）：

| name   | count |
| ------ | ----- |
| london | 4     |
| paris  | 2     |

我想要实现的是：

| name   | count |
| ------ | ----- |
| london | 4     |
| paris  | 2     |
| other  | 2     |

其中，所有其余的行（在我的例子中为'rome'和'madrid'）都在'other'下合并在一起。

我不太了解SQL，但在伪代码中，我考虑了类似以下的内容：

SELECT name, COUNT(name) AS count FROM cities
GROUP BY name
ORDER BY count DESC
LIMIT 2 AS top_2
LIMIT 2+ AS other

是否可以在单次查询中实现这个目标？作为一种权宜之计，我将执行相同的查询，但以相反的方式排序，并限制为"行数 - 前2行"（所以总共有3个查询，我想）。

（我很惊讶这个问题以前没有被提出过，但我找不到类似的问题。）

如您所请求，这是翻译好的部分。如果您需要进一步的帮助，请随时提问。

英文:

Summary: I have data that I group, count, and sort by count. I would like to keep only the first X lines, and have others accumulated as a line called "other"

The data below is available at DB Fiddle

CREATE TABLE cities (name TEXT);
INSERT INTO cities(name) VALUES (&#39;paris&#39;), (&#39;paris&#39;), ( &#39;london&#39;), ( &#39;london&#39;), ( &#39;london&#39;), ( &#39;london&#39;), ( &#39;rome&#39;), ( &#39;madrid&#39;);
SELECT name, COUNT(name) AS count FROM cities
GROUP BY name
ORDER BY count DESC
LIMIT 2

The code above generates the top two most present cities (the ones that have the largest number of entries):

| name   | count |
| ------ | ----- |
| london | 4     |
| paris  | 2     |

What I would like to achieve is

| name   | count |
| ------ | ----- |
| london | 4     |
| paris  | 2     |
| other  | 2     |

where all remaining rows ('rome' and 'madrid' in my case) are grouped together under other.

I do not know SQL very much but in pseudo-code I am thinking about something like

SELECT name, COUNT(name) AS count FROM cities
GROUP BY name
ORDER BY count DESC
LIMIT 2 AS top_2
LIMIT 2+ AS other

Is this possible in a single pass? As a workaround I will have the same query, but sorted the other way round and limited to "number of rows - the top 2" (so 3 queries in total I think).

(I am surprised that this has not been asked before, but could not find a match)

答案1

得分: 6

以下是您要翻译的部分：

一种选择是在row_number函数上使用一个case表达式来使用另一级别的汇总：

WITH T AS
(
  SELECT
     name, 
     COUNT(name) AS count,
     ROW_NUMBER() OVER (ORDER BY COUNT(name) DESC) AS rn 
  FROM cities
  GROUP BY name
  ORDER BY count DESC
)
SELECT 
   CASE 
     WHEN rn IN (1, 2) 
     THEN name 
     ELSE 'Others' 
   END AS name,
   SUM(count) AS count
FROM T
GROUP BY 1
ORDER BY count DESC

为了确保'Others'出现在结果集的最后，您可以使用以下方式进行排序：ORDER BY CASE WHEN rn IN (1, 2) THEN 1 ELSE 2 END, count DESC。

英文:

One option is to use another level of aggregation with a case expression on a row_number function:

WITH T AS
(
  SELECT
     name, 
     COUNT(name) AS count,
     ROW_NUMBER() OVER (ORDER BY COUNT(name) DESC) AS rn 
  FROM cities
  GROUP BY name
  ORDER BY count DESC
)
SELECT 
   CASE 
     WHEN rn IN (1, 2) 
     THEN name 
     ELSE &#39;Others&#39; 
   END AS name,
   SUM(count) AS count
FROM T
GROUP BY 1
ORDER BY count DESC

To ensure that 'Other' appear last in the result set, you may ORDER BY CASE WHEN rn IN (1, 2) THEN 1 ELSE 2 END, count DESC

Demo

答案2

得分: 0

以下是已经翻译好的部分：

使用通用表达式（Common Table Expression）：

WITH
	CTE_All (ct_all) AS
		(SELECT COUNT(name)
		FROM cities),
	CTE_Top (ct_name, ct_count) AS
		(SELECT TOP 2 name, COUNT(name)
		FROM cities
		GROUP BY name
		HAVING COUNT(name) > 1
		ORDER BY COUNT(name) DESC)
SELECT * FROM CTE_Top
UNION ALL
SELECT 'other', (SELECT ct_all FROM CTE_All) - SUM(ct_count)
	FROM CTE_Top;

英文:

With common table expression:

WITH
	CTE_All (ct_all) AS
		(SELECT COUNT(name)
		FROM cities),
	CTE_Top (ct_name, ct_count) AS
		(SELECT TOP 2 name, COUNT(name)
		FROM cities
		GROUP BY name
		HAVING COUNT(name) &gt; 1
		ORDER BY COUNT(name) DESC)
SELECT * FROM CTE_Top
UNION ALL
SELECT &#39;other&#39;, (SELECT ct_all FROM CTE_All) - SUM(ct_count)
	FROM CTE_Top;

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在使用GROUP BY时创建一个“其他”组？

问题

答案1

答案2

你可以在SQL Server中如何同时使用TSQL命令Top和IN？

在SQL – BigQuery中是否有一种方法可以使用变量N获取NTH_VALUE？

统计历史表中每个组的加薪次数

可选的 Oracle 查询限制

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。