英文:
How to get the most recent n records per group and aggregate them into an array
问题
我明白这是一个常见问题,我已经在这里和这里看到了,但这两个解决方案都对我不起作用。
我有一个带有字段session_time
、user
和email
的表(这些字段只是虚拟占位符,我的真实模式不同,请不要太过关注它们的名称),假设每个user
可以关联多个email
,我需要按每行输出:
1 - 一个user
2 - 一个由与user
相关的每个email
组成的数组
但是,有些用户有太多的电子邮件,所以我需要将它们限制在最近的条目上,比如每个用户最近的2个。
WITH
sample_data AS (
SELECT
'2023-06-12 10:00:00' AS session_time,
'1' AS user,
'example@example.com' AS email
UNION ALL
SELECT
'2023-06-12 11:00:00' AS session_time,
'2' AS user,
'example@example.com' AS email
UNION ALL
SELECT
'2023-06-12 12:00:00' AS session_time,
'3' AS user,
'example@example.com' AS email
UNION ALL
SELECT
'2023-06-12 13:00:00' AS session_time,
'3' AS user,
'example2@example.com' AS email
UNION ALL
SELECT
'2023-06-12 14:00:00' AS session_time,
'3' AS user,
'example3@example.com' AS email
)
, filtered_user_email AS (
SELECT
user,
email,
ROW_NUMBER() OVER(PARTITION BY user ORDER BY session_time DESC) row_num
FROM
sample_data
)
SELECT user, array_agg(DISTINCT email) as emails
FROM filtered_user_email
WHERE row_num <= 2
GROUP BY user
上述SQL查询将为每个用户选择最近的2个电子邮件。希望这有助于解决您的问题。
英文:
I know that this is a common problem, and I've seen here and here but neither of these solutions are working for me.
I have a table with fields session_time
, user
and email
(these fields are just dummy placeholders, my real schema is different so please don't look too much into their names)
supposing each user
can have various emails
related to it, I need to output, per row:
1 - a user
2 - an array consisting of every email
related to the user
However, some users have WAY too many emails, so I need to limit them to the most recent entries, lets say 2 most recent per user.
WITH
sample_data AS (
SELECT
'2023-06-12 10:00:00' AS session_time,
'1' AS user,
'example@example.com' AS email
UNION ALL
SELECT
'2023-06-12 11:00:00' AS session_time,
'2' AS user,
'example@example.com' AS email
UNION ALL
SELECT
'2023-06-12 12:00:00' AS session_time,
'3' AS user,
'example@example.com' AS email,
UNION ALL
SELECT
'2023-06-12 13:00:00' AS session_time,
'3' AS user,
'example2@example.com' AS email
UNION ALL
SELECT
'2023-06-12 14:00:00' AS session_time,
'3' AS user,
'example3@example.com' AS email
)
, filtered_user_email AS (
SELECT
user,
email,
ROW_NUMBER() OVER(PARTITION BY user, email ORDER BY MAX(session_time) DESC) row_num
FROM
sample_data
GROUP BY
uuid,
user
)
running SELECT user, array_agg(distinct email) as emails FROM filtered_user_email WHERE row_num <= 2 group by user
outputs every row, because every row gets a row_num of 1, so for user 3, I get 3 entries instead of just the two most recent ones.
How can I then just select the 2 most recent emails per user?
答案1
得分: 1
大概,您只需更改您的 row_number
中的 partition
:
select usr, array_agg(email) emails
from (
select usr, email,
row_number() over(partition by usr order by max(session_time))rn
from sample_data s
group by usr, email
) t
where rn <= 2
group by usr
英文:
Presumably, you just need to change the partition
of your row_number
:
select usr, array_agg(email) emails
from (
select usr, email,
row_number() over(partition by usr order by max(session_time))rn
from sample_data s
group by usr, email
) t
where rn <= 2
group by usr
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论