英文:
(BigQuery) Left join to gather lowest data from Left table that is greater than data from right table
问题
我需要使用左连接将表格与右表中仅大于左表中其他值的最低值进行连接。
我可以使用ROW_NUMBER和子查询来完成这个任务,但问题是我需要对来自5个不同表的5个操作执行相同的操作,我不希望生成一个长达10页的查询,难以供我的同事审查。
我有一个名为Table.Users的表,用于测量用户执行操作的时间。我想要将其与Table.Action1左连接,以收集在Table.Users中的操作之后首次发生的action1事件。
用户可以在两个表中多次出现。
是否有任何方法可以在多个左连接中仅收集每个表的一行,而不需要多个子查询?
我不能使用聚合,因为我需要从与特定操作相关联的action表中获取许多信息。
我的数据将如下所示:
Table.Users:
| User_id | Date |
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 1 | 5 |
| 1 | 6 |
Table.Action1:
| User_id | Date | Action.id |
| 1 | 1 | 1 |
| 1 | 2 | 2 |
| 1 | 5 | 3 |
| 1 | 6 | 4 |
理想结果:
Table.Users:
| User_id | Date | Action1.Date | Action1.id |
| 1 | 1 | 2 | 2 |
| 1 | 2 | 5 | 3 |
| 1 | 3 | 5 | 3 |
| 1 | 4 | 5 | 3 |
| 1 | 5 | 6 | 4 |
| 1 | 6 | | |
我以前使用多个子查询和ROW_NUMBER解决了这个问题,但现在我有更多需要连接的表格,不想生成一个非常长且难以理解的查询。
英文:
I need to LEFT JOIN a table with only the lowest values in the right table that are greater than other values in the left table.
I can do this with ROW_NUMBER and Subqueries but the issue is that I need to do the same with about 5 Actions from 5 different tables and I dont want a 10-page long query that is difficult for my coworkers to audit.
I have Table.Users that measures when a user does an action. I want to left join it with Table.Action1 to gather the first incident of action1 that occurs after the action in Table.Users.
Users can appear multiple times in both tables.
Is there any way to gather only one row each from multiple LEFT JOINs without multiple subqueries?
I cant use aggregations as there is a lot of info I need from the action tables that is connected to the specific actions.
My data would look like:
Table.Users:
| User_id | Date |
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 1 | 5 |
| 1 | 6 |
Table.Action1
| User_id | Date |Action.id|
| 1 | 1 |1|
| 1 | 2 |2|
| 1 | 5 |3|
| 1 | 6 |4|
Ideal Result:
Table.Users:
| User_id | Date |Action1.Date| Action1.id
| 1 | 1 |2 |2|
| 1 | 2 |5 |3|
| 1 | 3 |5 |3|
| 1 | 4 |5 |3|
| 1 | 5 |6 |4|
| 1 | 6 | | |
I have solved this before with multiple subqueries and ROW_NUMBER but I have many more tables that need joining now and dont want a very long query that is difficult for a co-worker to understand.
答案1
得分: 0
假设在 actions
表中不存在重复的 (user_id, dt)
,如您提供的示例数据所示,一种选项是使用 left join
来获取匹配的操作(如果有的话),然后使用窗口函数 last_value()
和选项 ignore nulls
来检索相关的操作日期和id。关键是要仔细调整窗口子句:
select u.*,
last_value(a.dt ignore nulls) over w as last_action_dt,
last_value(a.id ignore nulls) over w as last_action_id
from users u
left join actions a on a.user_id = u.user_id and a.dt = u.dt
window w as (
partition by u.user_id
order by u.dt desc
rows between unbounded preceding and 1 preceding
)
order by u.user_id, u.dt
顺便提一下:我在查询中将列名 date
重命名为 dt
,以避免与相应的SQL关键字冲突。
英文:
Assuming no duplicate (user_id, dt)
in the actions
table, as shown in your sample data, one option uses a left join
to bring the matching action (if any), and then window function last_value()
with option ignore nulls
to retrieve the relevant action date and id. The trick is to carefully adjust the window clause:
select u.*,
last_value(a.dt ignore nulls) over w last_action_dt,
last_value(a.id ignore nulls) over w last_action_id
from users u
left join actions a on a.user_id = u.user_id and a.dt = u.dt
window w as (
partition by u.user_id
order by u.dt desc
rows between unbounded preceding and 1 preceding
)
order by u.user_id, u.dt
Side note: I renamed column date
to dt
in the query, so it does not clash with the corresponding SQL keyword.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论