如何获取每个组的最新的n条记录并将它们聚合成一个数组

huangapple go评论65阅读模式
英文:

How to get the most recent n records per group and aggregate them into an array

问题

我明白这是一个常见问题,我已经在这里这里看到了,但这两个解决方案都对我不起作用。

我有一个带有字段session_timeuseremail的表(这些字段只是虚拟占位符,我的真实模式不同,请不要太过关注它们的名称),假设每个user可以关联多个email,我需要按每行输出:

1 - 一个user

2 - 一个由与user相关的每个email组成的数组

但是,有些用户有太多的电子邮件,所以我需要将它们限制在最近的条目上,比如每个用户最近的2个。

WITH
  sample_data AS (
  SELECT
    '2023-06-12 10:00:00' AS session_time,
    '1' AS user,
    'example@example.com' AS email
  UNION ALL
  SELECT
    '2023-06-12 11:00:00' AS session_time,
    '2' AS user,
    'example@example.com' AS email
  UNION ALL
  SELECT
    '2023-06-12 12:00:00' AS session_time,
    '3' AS user,
    'example@example.com' AS email
  UNION ALL
  SELECT
  '2023-06-12 13:00:00' AS session_time,
  '3' AS user,
  'example2@example.com' AS email
  UNION ALL
  SELECT
  '2023-06-12 14:00:00' AS session_time,
  '3' AS user,
  'example3@example.com' AS email
)

, filtered_user_email AS (
SELECT
  user,
  email,
  ROW_NUMBER() OVER(PARTITION BY user ORDER BY session_time DESC) row_num
FROM
  sample_data
)

SELECT user, array_agg(DISTINCT email) as emails
FROM filtered_user_email
WHERE row_num <= 2
GROUP BY user

上述SQL查询将为每个用户选择最近的2个电子邮件。希望这有助于解决您的问题。

英文:

I know that this is a common problem, and I've seen here and here but neither of these solutions are working for me.

I have a table with fields session_time, user and email (these fields are just dummy placeholders, my real schema is different so please don't look too much into their names)
supposing each user can have various emails related to it, I need to output, per row:

1 - a user

2 - an array consisting of every email related to the user

However, some users have WAY too many emails, so I need to limit them to the most recent entries, lets say 2 most recent per user.

WITH
  sample_data AS (
  SELECT
    &#39;2023-06-12 10:00:00&#39; AS session_time,
    &#39;1&#39; AS user,
    &#39;example@example.com&#39; AS email
  UNION ALL
  SELECT
    &#39;2023-06-12 11:00:00&#39; AS session_time,
    &#39;2&#39; AS user,
    &#39;example@example.com&#39; AS email
  UNION ALL
  SELECT
    &#39;2023-06-12 12:00:00&#39; AS session_time,
    &#39;3&#39; AS user,
        &#39;example@example.com&#39; AS email,
  UNION ALL
  SELECT
  &#39;2023-06-12 13:00:00&#39; AS session_time,
  &#39;3&#39; AS user,
  &#39;example2@example.com&#39; AS email
  UNION ALL
  SELECT
  &#39;2023-06-12 14:00:00&#39; AS session_time,
  &#39;3&#39; AS user,
  &#39;example3@example.com&#39; AS email
)

, filtered_user_email AS (
SELECT
  user,
  email,
  ROW_NUMBER() OVER(PARTITION BY user, email ORDER BY MAX(session_time) DESC) row_num
FROM
  sample_data
GROUP BY
  uuid,
  user
)

running SELECT user, array_agg(distinct email) as emails FROM filtered_user_email WHERE row_num &lt;= 2 group by user outputs every row, because every row gets a row_num of 1, so for user 3, I get 3 entries instead of just the two most recent ones.

How can I then just select the 2 most recent emails per user?

答案1

得分: 1

大概,您只需更改您的 row_number 中的 partition

select usr, array_agg(email) emails
from (
    select usr, email, 
        row_number() over(partition by usr order by max(session_time))rn
    from sample_data s
    group by usr, email
) t
where rn <= 2
group by usr
英文:

Presumably, you just need to change the partition of your row_number:

select usr, array_agg(email) emails
from (
    select usr, email, 
        row_number() over(partition by usr order by max(session_time))rn
    from sample_data s
    group by usr, email
) t
where rn &lt;= 2
group by usr

huangapple
  • 本文由 发表于 2023年6月12日 22:22:21
  • 转载请务必保留本文链接:https://go.coder-hub.com/76457597.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定