英文:
Subqueries vs Multi Table Join
问题
Way 1: -
选择 count(id) from A a join B b on a.id = b.id join C c on B.id = C.id;
结果计数 - X
Way 2: -
选择 count(id) FROM A WHERE id IN (选择 id FROM B WHERE id IN (选择 id FROM C));
结果计数 - Y
每个查询的结果计数不同。到底出了什么问题?
英文:
I've 3 tables A, B, C. I want to list the intersection count.
Way 1:-
select count(id) from A a join B b on a.id = b.id join C c on B.id = C.id;
Result Count - X
Way 2:-
SELECT count(id) FROM A WHERE id IN (SELECT id FROM B WHERE id IN (SELECT id FROM C));
Result Count - Y
The result count in each of the query is different. What exactly is wrong?
答案1
得分: 2
使用JOIN
可以增加行数,同时过滤掉一些行。
在这种情况下,第二个计数应该是正确的,因为没有重复计算 - 假设a
中的id
是唯一的。如果不是唯一的,则需要使用count(distinct a.id)
。
使用JOIN
的等效方法将使用COUNT(DISTINCT)
:
select count(distinct a.id)
from A a join
B b
on a.id = b.id join
C c
on B.id = C.id;
我提到这个完整性的问题,但不建议这种方法。增加行数然后使用distinct
来删除它们是低效的。
在许多数据库中,最有效的方法可能是:
select count(*)
from a
where exists (select 1 from b where b.id = a.id) and
exists (select 1 from c where c.id = a.id);
注意:这假设id
列上有索引,并且a
中的id
是唯一的。
英文:
A JOIN
can multiply the number of rows as well as filtering out rows.
In this case, the second count should be the correct one because nothing is double counted -- assuming id
is unique in a
. If not, it needs count(distinct a.id)
.
The equivalent using JOIN
would use COUNT(DISTINCT)
:
select count(distinct a.id)
from A a join
B b
on a.id = b.id join
C c
on B.id = C.id;
I mention this for completeness but do not recommend this approach. Multiplying the number of rows just to remove them using distinct
is inefficient.
In many databases, the most efficient method might be:
select count(*)
from a
where exists (select 1 from b where b.id = a.id) and
exists (select 1 from c where c.id = a.id);
Note: This assumes there are indexes on the id
columns and that id
is unique in a
.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论