英文:
How to create an "other" group when using GROUP BY?
问题
摘要:我有一些数据,我对其进行分组、计数,并按计数排序。我想保留前X行,并将其他行累积为一行,命名为"其他"。
CREATE TABLE cities (name TEXT);
INSERT INTO cities(name) VALUES ('paris'), ('paris'), ('london'), ('london'), ('london'), ('london'), ('rome'), ('madrid');
SELECT name, COUNT(name) AS count FROM cities
GROUP BY name
ORDER BY count DESC
LIMIT 2
上面的代码生成了出现次数最多的两个城市(具有最多条目的城市):
| name | count |
| ------ | ----- |
| london | 4 |
| paris | 2 |
我想要实现的是:
| name | count |
| ------ | ----- |
| london | 4 |
| paris | 2 |
| other | 2 |
其中,所有其余的行(在我的例子中为'rome'和'madrid')都在'other'下合并在一起。
我不太了解SQL,但在伪代码中,我考虑了类似以下的内容:
SELECT name, COUNT(name) AS count FROM cities
GROUP BY name
ORDER BY count DESC
LIMIT 2 AS top_2
LIMIT 2+ AS other
是否可以在单次查询中实现这个目标?作为一种权宜之计,我将执行相同的查询,但以相反的方式排序,并限制为"行数 - 前2行"(所以总共有3个查询,我想)。
(我很惊讶这个问题以前没有被提出过,但我找不到类似的问题。)
如您所请求,这是翻译好的部分。如果您需要进一步的帮助,请随时提问。
英文:
Summary: I have data that I group, count, and sort by count. I would like to keep only the first X lines, and have others accumulated as a line called "other"
The data below is available at DB Fiddle
CREATE TABLE cities (name TEXT);
INSERT INTO cities(name) VALUES ('paris'), ('paris'), ( 'london'), ( 'london'), ( 'london'), ( 'london'), ( 'rome'), ( 'madrid');
SELECT name, COUNT(name) AS count FROM cities
GROUP BY name
ORDER BY count DESC
LIMIT 2
The code above generates the top two most present cities (the ones that have the largest number of entries):
| name | count |
| ------ | ----- |
| london | 4 |
| paris | 2 |
What I would like to achieve is
| name | count |
| ------ | ----- |
| london | 4 |
| paris | 2 |
| other | 2 |
where all remaining rows ('rome'
and 'madrid'
in my case) are grouped together under other
.
I do not know SQL very much but in pseudo-code I am thinking about something like
SELECT name, COUNT(name) AS count FROM cities
GROUP BY name
ORDER BY count DESC
LIMIT 2 AS top_2
LIMIT 2+ AS other
Is this possible in a single pass? As a workaround I will have the same query, but sorted the other way round and limited to "number of rows - the top 2" (so 3 queries in total I think).
(I am surprised that this has not been asked before, but could not find a match)
答案1
得分: 6
以下是您要翻译的部分:
一种选择是在row_number函数上使用一个case表达式来使用另一级别的汇总:
WITH T AS
(
SELECT
name,
COUNT(name) AS count,
ROW_NUMBER() OVER (ORDER BY COUNT(name) DESC) AS rn
FROM cities
GROUP BY name
ORDER BY count DESC
)
SELECT
CASE
WHEN rn IN (1, 2)
THEN name
ELSE 'Others'
END AS name,
SUM(count) AS count
FROM T
GROUP BY 1
ORDER BY count DESC
为了确保'Others'出现在结果集的最后,您可以使用以下方式进行排序:ORDER BY CASE WHEN rn IN (1, 2) THEN 1 ELSE 2 END, count DESC
。
英文:
One option is to use another level of aggregation with a case expression on a row_number function:
WITH T AS
(
SELECT
name,
COUNT(name) AS count,
ROW_NUMBER() OVER (ORDER BY COUNT(name) DESC) AS rn
FROM cities
GROUP BY name
ORDER BY count DESC
)
SELECT
CASE
WHEN rn IN (1, 2)
THEN name
ELSE 'Others'
END AS name,
SUM(count) AS count
FROM T
GROUP BY 1
ORDER BY count DESC
To ensure that 'Other' appear last in the result set, you may ORDER BY CASE WHEN rn IN (1, 2) THEN 1 ELSE 2 END, count DESC
答案2
得分: 0
以下是已经翻译好的部分:
使用通用表达式(Common Table Expression):
WITH
CTE_All (ct_all) AS
(SELECT COUNT(name)
FROM cities),
CTE_Top (ct_name, ct_count) AS
(SELECT TOP 2 name, COUNT(name)
FROM cities
GROUP BY name
HAVING COUNT(name) > 1
ORDER BY COUNT(name) DESC)
SELECT * FROM CTE_Top
UNION ALL
SELECT 'other', (SELECT ct_all FROM CTE_All) - SUM(ct_count)
FROM CTE_Top;
英文:
With common table expression:
WITH
CTE_All (ct_all) AS
(SELECT COUNT(name)
FROM cities),
CTE_Top (ct_name, ct_count) AS
(SELECT TOP 2 name, COUNT(name)
FROM cities
GROUP BY name
HAVING COUNT(name) > 1
ORDER BY COUNT(name) DESC)
SELECT * FROM CTE_Top
UNION ALL
SELECT 'other', (SELECT ct_all FROM CTE_All) - SUM(ct_count)
FROM CTE_Top;
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论