英文:
Why move where clause to join clause produce different result
问题
在SQL Server中,我有两个大表,每个表约有5亿条记录,我期望这两个查询会产生相同的结果,但实际情况不是这样的!
select count(*)
from ff
left join d on
ff.ID = d.id
where ff.sendtime < '2023-01-21' and d.id is null
select count(*)
from ff
left join d on
ff.sendtime < '2023-01-21' and ff.ID = d.id
where d.id is null
英文:
In SQl server I have two big tables with each one about 500,000,000 records,
I expect that these two query produce same result, but not!
select count(*)
from ff
left join d on
ff.ID = d.id
where ff.sendtime < '2023-01-21' and d.id is null
select count(*)
from ff
left join d on
ff.sendtime < '2023-01-21' and ff.ID = d.id
where d.id is null
答案1
得分: 2
First query
选择所有在2023-01-21之前的ff行。然后根据ID与所有d行进行外连接。然后排除所有在dd中有匹配的ff行。因此,您选择了所有在2023-01-21之前的ff行,其中不存在d行。
Second query
选择所有ff行,不管日期如何。根据ID和ff日期外连接所有d行。然后排除所有在dd中有匹配的ff行。因此,您选择了所有ff行,其senddate大于等于2023-01-21或没有与d匹配的行。
Anti joins
反连接,其中您对一个表进行外连接并仅保留外连接的行,用于选择在该表中没有匹配数据的数据。我认为“NOT EXISTS”或“NOT IN”比连接更易读。这些子句通过将感兴趣的表放在“FROM”子句中,将其他表的查找放在“WHERE”子句中,明确了要选择的数据以及选择数据的条件。
这是我如何编写第一个查询:
select count(*)
from ff
where sendtime < '2023-01-21'
and not exists (select * from d where d.id = ff.id);
这是我如何编写第二个查询:
select count(*)
from ff
where sendtime >= '2023-01-21'
or not exists (select * from d where d.id = ff.id);
英文:
First query
Select all ff rows before 2023-01-21. Then outer join all d rows on the ID. Then dismiss all rows where ff had a match in dd. Thus you select all ff rows before 2023-01-21 for which not exists a d row.
Second query
Select all ff rows, no matter which date. Outer join all d rows based on the ID and the ff date. Then dismiss all rows where ff had a match in dd. Thus you select all ff rows the senddate of which is >= 2023-01-21 or that don't have a match in d.
Anti joins
Anti joins, where you outer join a table and only keep the outer joined rows, are meant to select data where there is no matching data in that table. I consider NOT EXISTS
or NOT IN
way more readable than the join. These clauses make plain what you want to select and what the condition is for selecting the data, by putting the table of interest in the FROM
clause and the lookup of the other table in the WHERE
clause.
This is how I'd write the first query:
select count(*)
from ff
where sendtime < '2023-01-21'
and not exists (select * from d where d.id = ff.id);
This is how I'd write the second query:
select count(*)
from ff
where sendtime >= '2023-01-21'
or not exists (select * from d where d.id = ff.id);
答案2
得分: 0
第一个查询中,基于“ID”列执行了表ff和d之间的左连接,而在第二个查询中,则检查“ff.sendtime”是否小于给定时间,即2023-01-21,并且“ID”在两个表中是否匹配。
英文:
In the first query the left join between tables ff and d is performed based on the "ID" column where as in the second it checks "ff.sendtime" is less than given time i.e.., 2023-01-21 and "ID" is matching in two tables or not.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论