为什么将“where”子句移到“join”子句会产生不同的结果。

huangapple go评论66阅读模式
英文:

Why move where clause to join clause produce different result

问题

在SQL Server中,我有两个大表,每个表约有5亿条记录,我期望这两个查询会产生相同的结果,但实际情况不是这样的!

select count(*)
from ff
left join d on
ff.ID = d.id
where ff.sendtime < '2023-01-21' and d.id is null
select count(*)
from ff
left join d on
ff.sendtime < '2023-01-21' and ff.ID = d.id
where d.id is null
英文:

In SQl server I have two big tables with each one about 500,000,000 records,
I expect that these two query produce same result, but not!

select count(*)
from ff
         left join d on
    ff.ID = d.id
where ff.sendtime &lt; &#39;2023-01-21&#39; and d.id is null
select count(*)
from ff
         left join d on
    ff.sendtime &lt; &#39;2023-01-21&#39; and ff.ID = d.id
where d.id is null

答案1

得分: 2

First query

选择所有在2023-01-21之前的ff行。然后根据ID与所有d行进行外连接。然后排除所有在dd中有匹配的ff行。因此,您选择了所有在2023-01-21之前的ff行,其中不存在d行。

Second query

选择所有ff行,不管日期如何。根据ID和ff日期外连接所有d行。然后排除所有在dd中有匹配的ff行。因此,您选择了所有ff行,其senddate大于等于2023-01-21或没有与d匹配的行。

Anti joins

反连接,其中您对一个表进行外连接并仅保留外连接的行,用于选择在该表中没有匹配数据的数据。我认为“NOT EXISTS”或“NOT IN”比连接更易读。这些子句通过将感兴趣的表放在“FROM”子句中,将其他表的查找放在“WHERE”子句中,明确了要选择的数据以及选择数据的条件。

这是我如何编写第一个查询:

select count(*)
from ff
where sendtime < '2023-01-21'
and not exists (select * from d where d.id = ff.id);

这是我如何编写第二个查询:

select count(*)
from ff
where sendtime >= '2023-01-21'
or not exists (select * from d where d.id = ff.id);
英文:

First query

Select all ff rows before 2023-01-21. Then outer join all d rows on the ID. Then dismiss all rows where ff had a match in dd. Thus you select all ff rows before 2023-01-21 for which not exists a d row.

Second query

Select all ff rows, no matter which date. Outer join all d rows based on the ID and the ff date. Then dismiss all rows where ff had a match in dd. Thus you select all ff rows the senddate of which is >= 2023-01-21 or that don't have a match in d.

Anti joins

Anti joins, where you outer join a table and only keep the outer joined rows, are meant to select data where there is no matching data in that table. I consider NOT EXISTS or NOT IN way more readable than the join. These clauses make plain what you want to select and what the condition is for selecting the data, by putting the table of interest in the FROM clause and the lookup of the other table in the WHERE clause.

This is how I'd write the first query:

select count(*)
from ff
where sendtime &lt; &#39;2023-01-21&#39;
and not exists (select * from d where d.id = ff.id);

This is how I'd write the second query:

select count(*)
from ff
where sendtime &gt;= &#39;2023-01-21&#39;
or not exists (select * from d where d.id = ff.id);

答案2

得分: 0

第一个查询中,基于“ID”列执行了表ff和d之间的左连接,而在第二个查询中,则检查“ff.sendtime”是否小于给定时间,即2023-01-21,并且“ID”在两个表中是否匹配。

英文:

In the first query the left join between tables ff and d is performed based on the "ID" column where as in the second it checks "ff.sendtime" is less than given time i.e.., 2023-01-21 and "ID" is matching in two tables or not.

huangapple
  • 本文由 发表于 2023年6月12日 02:58:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/76452058.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定