英文:
JOIN with conditions vs WHERE clause
问题
Query #1:
选择 t1.PK
从 table1 as t1
加入 table2 as t2 在 t1.pk = t2.pk ....
其中 t1.someattribute = somecondition
Query #2:
选择 t1.PK
从 table1 as t1
加入 table2 as t2 在 (t1.pk = t2.pk
并且 t1.someattribute = somecondition)
哪一个性能更好?我的理解是第二个更好,因为它在连接这两个表的同时选择要处理的行,因此它不会在内存中保留大量的行(取决于表的大小),而在处理 where
子句时会被丢弃,这样只会浪费内存。
但是当我测试它们时,我没有看到很大的差异。是否有人可以确认/否定在连接发生时添加 AND 是否会减少表处理的行数的前提是真实的?
英文:
I have two queries.
Query #1:
select t1.PK
from table1 as t1
join table2 as t2 on t1.pk = t2.pk ....
where t1.someattribute = somecondition
Query #2:
select t1.PK
from table1 as t1
join table2 as t2 on (t1.pk = t2.pk
and t1.someattribute = somecondition)
Which one would perform better? My understanding was that the second one was better because while it is joining the two tables it is selecting which rows it wants to work with and therefore it won't keep in memory a huge number of rows (depending on the size of the table) that when the where
clause is processed will be discarded and only were kept in memory for nothing.
But when I test them I don't see a big difference. Could anybody confirm/deny if the premise that adding an AND while the join is happening reduces the number of rows that the table works with is true or not?
答案1
得分: 1
SQL是一种声明性语言。你通过SQL描述你想要什么,而接收SQL的系统决定如何为你获取它。换句话说,你不太关心它是如何获得结果的。因此,在几乎所有的关系数据库管理系统中,这两个语句是等效的。
在解析SQL之后,系统将执行优化步骤,以尽快执行。任何值得一提的关系数据库管理系统都会执行一个叫做“谓词推送”的优化步骤,在这个步骤中,它将取出WHERE子句中的条件,并确定在读取表时是否可以应用这些条件于数据选择。
如果选择了所有内容,然后在所有连接后才查看WHERE子句,那将有点愚蠢。总之,微软的目标不是不必要地消耗计算资源,或许是吧,我不在那里工作。
无论如何,我没有Azure实例来测试,但如果你想确认的话,你可以对这两个查询运行一个解释/执行计划,亲自查看是否有任何差异。很可能它们会相同。
英文:
SQL is a declarative language. You describe what you want via SQL, and the system to which you submit that SQL decides how to get it for you. In other words, you don't get much say about HOW it gets to that result. As such, in just about every RDBMS out there, these two statements are synonymous.
After parsing the sql, the system will perform optimization steps to execute as quickly as possible. Any RDBMS worth its salt will perform an optimization step called "predicate pushdown" in which it will take conditions in the WHERE clause and determine if it can apply them to data selection when it reads the tables.
It would be kind of silly to select EVERYTHING and only after shuffling the data together through all the joins finally look at the WHERE clause. At the end of the day Microsoft isn't in the cloud business to burn compute unnecessarily, or maybe it is. I don't work there.
At any rate, I don't have an Azure instance to test, but if you want to confirm you can run an explain/execution plan for both queries and see for yourself if there any differences. Likely they will be the same.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论