英文:
Getting only the highest counts per type in Postgres
问题
以下是您要翻译的内容:
"假设使用 Postgres 15 作为数据库引擎,我要如何查询以获取每个孩子最受欢迎的玩具类型的列表,按孩子分组。因此,结果应该是:
kid_name | toy_type | count |
---|---|---|
Edward | bear | 3 |
Lydia | car | 2 |
请注意,这只是原文的中文翻译,不包括代码部分。
英文:
Say I have a table with kids and their toys.
CREATE TABLE kids_toys (
kid_name character varying,
toy_type character varying,
toy_name character varying
);
kid_name | toy_type | toy_name |
---|---|---|
Edward | bear | Pooh |
Edward | bear | Pooh2 |
Edward | bear | Simba |
Edward | car | Vroom |
Lydia | doll | Sally |
Lydia | car | Beeps |
Lydia | car | Speedy |
Edward | car | Red |
I want to get a list of the the most popular toy type for each kid, grouped by kid. So the result would be
kid_name | toy_type | count |
---|---|---|
Edward | bear | 3 |
Lydia | car | 2 |
Assuming Postgres 15 as the engine, how would I query to do this? I keep getting stuck on how to generate the count but then only take the max result from each per-kid count.
答案1
得分: 3
在Postgres中,我建议使用distinct on
,它可以在一次查询中完成任务:
select distinct on (kid_name) kid_name, toy_type, count(*) cnt
from kids_toys
group by kid_name, toy_type
order by kid_name, count(*) desc, toy_type
这个查询通过孩子和玩具对数据集进行分组。然后,distinct on
确保每个孩子只返回一条记录;order by
子句将每个孩子最受欢迎的玩具排在前面。如果有平局,将选择第一个玩具(按字母顺序)。
如果你想保留平局(Postgres的distinct on
不能做到这一点),我们可以改用rank()
和 fetch with ties
:
select kid_name, toy_type, count(*) cnt
from kids_toys
group by kid_name, toy_type
order by rank() over(partition by kid_name order by count(*) desc)
fetch first row with ties
不保留平局:这个查询通过孩子和玩具对数据集进行分组,并根据每个孩子的玩具数量降序排列,然后选择排名第一的记录。
英文:
In Postgres, I would recommend distinct on
, which can get the job done in a single pass:
select distinct on (kid_name) kid_name, toy_type, count(*) cnt
from kids_toys
group by kid_name, toy_type
order by kid_name, count(*) desc, toy_type
The query groups the dataset by kid and toy. Then distinct on
ensures that only one record is returned for each kid; the order by
clause puts the most popular toy of each kid first. If there are ties, the first toy is picked (alphabetically).
If you wanted to retain ties (which Postgres' distinct on
cannot do), we could use rank()
and fetch with ties
instead:
select kid_name, toy_type, count(*) cnt
from kids_toys
group by kid_name, toy_type
order by rank() over(partition by kid_name order by count(*) desc)
fetch first row with ties
答案2
得分: 2
首先,按照 kid_name
和 toy_type
进行分组,以找出每种类型的玩具每个孩子有多少个。
然后,添加一个仅以 kid_name
为分区条件,按 count
降序排序的 row_number
窗口函数,以找出每个孩子每种玩具的位置,从最高数量到最低数量。
最后,仅筛选出 row_num = 1
的记录。
此外,如果你想要每个孩子的前3个玩具,你可以使用 row_num <= 3
。
select kid_name, toy_type, cnt
from
(select kid_name, toy_type, cnt, row_number() over(partition by kid_name order by cnt desc) as row_num
from (
select kid_name, toy_type, count(*) as cnt
from kids_toys
group by kid_name, toy_type
) as grouped
) as with_row_num
where row_num = 1
英文:
First, group by kid_name
and toy_type
to find how many toys the kid has from each type.
Then, add a row_number
window function partitioned only by the kid_name
and order by the count
descending to find the position of each toy_type
from highest count to lowest for each individual kid
And lastly, filter only the records with row_num = 1
Also, if you would like the top 3 toys per kid for example, you can use row_num <= 3
instead
select kid_name, toy_type, cnt
from
(select kid_name, toy_type, cnt, row_number() over(partition by kid_name order by cnt desc) as row_num
from (
select kid_name, toy_type, count(*) as cnt
from kids_toys
group by kid_name, toy_type
) as grouped
) as with_row_num
where row_num = 1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论