如何根据共同的列值从两个表中包括列,但只包括左表中的行?

huangapple go评论59阅读模式
英文:

How to include columns from two tables based on a common column value but include rows only from the left table?

问题

我有两个表格 tb1tb2tb2 通过一个名为 tk_id 的外键与 tb1 相连接。以下是我的两个表格的样子:

tb1

     列名      |            数据类型             | 校对规则 | 允许空值 |                     默认值                      
-----------------+-----------------------------+-----------+----------+--------------------------------------------------
 id              | 整数                     |           | 非空 | 下一个值('tb2_id_seq'::regclass)
 创建时间      | 无时区时间戳 |           | 非空 | 
 修改时间     | 无时区时间戳 |           | 非空 | 
 状态          | 双精度浮点数            |           | 非空 | 
 tk_id           | UUID                        |           | 非空 | 

tb2

     列名     |            数据类型             | 校对规则 | 允许空值 | 默认值 
----------------+-----------------------------+-----------+----------+---------
 id             | UUID                        |           | 非空 | 
 创建时间     | 无时区时间戳 |           | 非空 | 
 修改时间    | 无时区时间戳 |           | 非空 | 
 目标ID | UUID                        |           | 非空 | 
 源ID      | UUID                        |           | 非空 | 
 tk_id          | UUID                        |           | 非空 | 

现在我需要从 tb1 获取所有行,这些行具有来自 tb1tb2 的列,当 tk_id 值匹配两个行时。

这是我尝试过的:

select tb1.created_at, tb1.status, tb2.source_id, tb2.destination_id from tb1 
inner join tb2 on tb1.tk_id = tb2.tk_id where 
tb1.created_at > timezone('utc', now()) - interval '40 minutes';

但我得到了太多的行。通常在 40分钟 的时间间隔内,会有大约 800 条记录,但在连接后,我得到了大约 100,000+ 条记录。

编辑: 经过一些阅读和几次尝试,我在查询中进行了一些更改,并成功将行数减少到了预期的行数。这是我的查询现在:

SELECT count(*) FROM tb1 LEFT OUTER JOIN (SELECT DISTINCT tk_id FROM tb2) t2 
ON tb1.tk_id = t2.tk_id where tb1.created_at > timezone('utc', now()) - 
interval '40 minutes';

但现在我无法在我的选择查询中获取 tb2 的列。

我做错了什么?

编辑2: 如果我之前没有表达清楚,我很抱歉。连接条件应该基于右表中 tk_id 的最新出现。因此,对于左表的每一行 (tb1),它应该与右表的 tk_id 的最新出现匹配,并提取右表的列。

英文:

I have two tables tb1 and tb2. tb2 is connected to tb1 via a foreign key named tk_id. Here is how my two tables look like

tb1

     Column      |            Type             | Collation | Nullable |                     Default                      
-----------------+-----------------------------+-----------+----------+--------------------------------------------------
 id              | integer                     |           | not null | nextval('tb2_id_seq'::regclass)
 created_at      | timestamp without time zone |           | not null | 
 modified_at     | timestamp without time zone |           | not null | 
 status          | double precision            |           | not null | 
 tk_id           | uuid                        |           | not null | 

tb2

     Column     |            Type             | Collation | Nullable | Default 
----------------+-----------------------------+-----------+----------+---------
 id             | uuid                        |           | not null | 
 created_at     | timestamp without time zone |           | not null | 
 modified_at    | timestamp without time zone |           | not null | 
 destination_id | uuid                        |           | not null | 
 source_id      | uuid                        |           | not null | 
 tk_id          | uuid                        |           | not null | 

Now I need to get all rows from tb1 which has columns from both tb1 and tb2 when the tk_id value matches for both the rows.

This is what I tried:

select tb1.created_at, tb1.status, tb2.source_id, tb2.destination_id from tb1 
inner join tb2 on tb1.tk_id = tb2.tk_id where 
tb1.created_at > timezone('utc', now()) - interval '40 minutes';

But I am getting way too many rows. Usually in a 40min interval, there would be around 800 records but after the join, I am getting around 100,000+ records.

EDIT: After some reading and a few tries, I made some changes in my query and managed to get the rows down to the expected number of rows. This is my query now

SELECT count(*) FROM tb1 LEFT OUTER JOIN (SELECT DISTINCT tk_id FROM tb2) t2 
ON tb1.tk_id = t2.tk_id where tb1.created_at > timezone('utc', now()) - 
interval '40 minutes';

But now I can't get the columns of tb2 in my select query.

What am I doing wrong?

EDIT2: Sorry if I couldn't make this clearer earlier. The join condition should be based on the latest occurrence of tk_id of the right table. So for every row of left table (tb1), it should match against the latest occurrence of tk_id of right table (tb2) and fetch the right table's columns.

答案1

得分: 1

以下是翻译好的内容:

由于在 tb1 中有明显的多行与 tb1 中的单行匹配,您需要定义要选择哪一行。或者需要一些汇总?

此查询返回所有来自 tb1 的符合条件的行,并从 tb2 中最后创建的一个匹配行添加列(如果有的话):

SELECT tb1.created_at, tb1.status, tb2.*
FROM   tb1
LEFT   JOIN LATERAL (
   SELECT tb2.source_id, tb2.destination_id
   FROM   tb2
   WHERE  tb2.tk_id = tb1.tk_id
   ORDER  BY created_at DESC, id DESC
   LIMIT  1
   ) tb2 ON true
WHERE  tb1.created_at > timezone('utc', now()) - interval '40 minutes';

在这里添加了 id DESC 作为补充,因为 created_at 可能不是唯一的。根据您未透露的需求进行调整。

相关链接:

请在 tb2(tk_id, created_at, id) 上添加索引来支持此查询。当然,还需要在 tb1(created_at) 上添加另一个索引。或者甚至可以使用覆盖索引在 tb1(created_at) 上包含 (tk_id, status) 列。参考:

英文:

Since there are obviously many rows in tb2 for a single row in tb1, you need to define which row to pick. Or some aggregation?

This query returns all qualifying rows from tb1, and adds columns from the one matching row in tb2 that was created last (if any):

SELECT tb1.created_at, tb1.status, tb2.*
FROM   tb1
LEFT   JOIN LATERAL (
   SELECT tb2.source_id, tb2.destination_id
   FROM   tb2
   WHERE  tb2.tk_id = tb1.tk_id
   ORDER  BY created_at DESC, id DESC
   LIMIT  1
   ) tb2 ON true
WHERE  tb1.created_at > timezone('utc', now()) - interval '40 minutes';

Added id DESC as tiebreaker since created_at might not be unique.
Adapt to your undisclosed needs.

Related:

Support this query with an index on tb2(tk_id, created_at, id).
And another index on tb1(created_at), obviously. Or even a covering index on tb1(created_at) INCLUDE (tk_id, status). See:

huangapple
  • 本文由 发表于 2023年6月22日 04:27:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/76526906.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定