2023年4月19日 14:42:55go评论63阅读模式

英文:

Doubts about Mysql query statements: EXISTS and IN

问题

我有三个表：jcxx_device_check_log，sys_depart 和 jcxx_device_info。

jcxx_device_check_log 表包含 640,000 条数据。

sys_depart 表包含 260 条数据。

jcxx_device_info 表包含 840 条数据。

exists SQL 查询的 EXPLAIN 格式为 JSON：

"cost_info": { "query_cost": "453972.80" },

in SQL 查询的 EXPLAIN 格式为 JSON：

"cost_info": { "query_cost": "2025375.58" },

这是我的猜测：

exists SQL 查询的结果集不保存在内存中，只返回 true 和 false。

这是我的问题：

我想知道我的猜测是否正确，以及在遇到大量数据时如何选择语句。

英文:

I have three tables jcxx_device_check_log,sys_depart and jcxx_device_info

the jcxx_device_check_log contains 640000 data volume,

the sys_depart contains 260 data volume

the jcxx_device_info contains 840 data volume

the exists sql

EXPLAIN
format = Json
SELECT state,check_time,note,device_id,COUNT(1) FROM jcxx_device_check_log jc
WHERE  exists (
SELECT jdi.id FROM sys_depart sd
INNER JOIN jcxx_device_info jdi on sd.id=jdi.sys_depart_id
where jc.device_id = jdi.id
) and check_time BETWEEN &#39;2023-4-18 10:00:00&#39; AND &#39; 2023-4-18 18:00:00&#39;
GROUP BY device_id,state
ORDER BY device_id,state

the EXPLAIN format = Json results

"cost_info": { "query_cost": "453972.80" },

the in sql

EXPLAIN
format = Json
SELECT state,check_time,device_id,COUNT(1) FROM jcxx_device_check_log jc
WHERE device_id in (
SELECT jdi.id FROM sys_depart sd
INNER JOIN jcxx_device_info jdi on jdi.sys_depart_id=sd.id
) and check_time BETWEEN &#39;2023-4-18 10:00:00&#39; AND &#39; 2023-4-18 18:00:00&#39;
GROUP BY device_id,state
ORDER BY device_id,state;

the EXPLAIN format = Json results

"cost_info": { "query_cost": "2025375.58" },

this is my guess

the result set of exists sql does not exist in the memory and only returns true and false.

this is my question

I want to know whether my guess is correct and how to choose a statement next time I encounter a large amount of data

答案1

得分: 2

以下是您要翻译的内容：

There is no single simple answer to "how to choose a statement next time" as the conditions of different tables/indices/statistics etc. will all influence the outcome. Furthermore, there is a lot of myth and legend around the topics of IN and EXISTS which generally result in a rule of thumb like this: "[The general rule of thumb is that if the subquery contains a large volume of data, the EXISTS operator provides better performance.]"

However, such rules of thumb are difficult (if not impossible) to prove and yet again may not be true if conditions are suited to use of IN() (e.g. if the in lists an indexed column and the subquery itself is inherently efficient). So be wary of such "rules" and as another prior answer states it:

[The allegedly performance and technical differences between EXISTS and IN may result from specific vendor's implementations/limitations/bugs, but many times they are nothing but myths created due to lack of understanding of the databases internals.]

You are correct about the YES/NO nature of using EXISTS, it merely tests if a match can be found, and if found returns TRUE, otherwise FALSE. However, we cannot know if the subquery used is in memory or not.

For knowing when to choose IN or EXISTS, in truth there is no absolute guide, and all I can do is list out some guidance:

If the "selective predicates(s)" are self-contained in the subquery, use IN, e.g.,
```
select t1.* 
from t1 
where t1.x in (select t2.x from t2 where t2.y = 'foo')
```
If the "selective predicates(s)" are from the outer table, use EXISTS, e.g.,
```
select t1.* 
from t1 
where exists (select null from t2 where t2.x=t1.x)
```

Where the relationships between the outer and inner tables are complex, use EXISTS, e.g.,

select t1.* 
from t1 
where exists (select null 
             from t2 
             where t2.x=t1.x 
                   and t2.y>t1.y 
                   and t2.z like '%' || t1.z || '%')

If seeking the negative of IN i.e. NOT IN, be aware that if the subquery returns NULL, then the whole result will be NULL (and this can be unexpected), so take steps to exclude NULL from the subquery result, e.g.,
```
select t1.* 
from t1 
where t1.x NOT in (select t2.x from t2 where t2.x IS NOT NULL)
```

NB: The term "selective predicate" is used to mean any predicate(s) that filter out a significant number of rows.

I recommend you also review the many prior answers to this: "Difference between EXISTS and IN in SQL."

英文:

There is no single simple answer to "how to choose a statement next time" as the conditions of different tables/indices/statistics etc. will all influence the outcome. Furthermore, there is a lot of myth and legend around the topics of IN and EXISTS which generally result in a rule of thumb like this: "*The general rule of thumb is that if the subquery contains a large volume of data, the EXISTS operator provides a better performance."

However such rules of thumb are difficult (if not impossible) to prove and yet again may not be true if conditions are suited to use of IN() (e.g. if the in lists an indexed column and the subquery itself is inherently efficient). So be wary of such "rules" and as another prior answer states it:

> The allegedly performance and technical differences between EXISTS and IN may result from specific vendor's implementations/limitations/bugs, but many times they are nothing but myths created due to lack of understanding of the databases internals.

You are correct about the YES/NO nature of using EXISTS, it merely tests if a match can be found, and if found returns TRUE, otherwise FALSE. However we cannot know if the subquery used is in memory or not.

For knowing when to choose IN or EXISTS, in truth there is no absolute guide and all I can do is list out some guidance:

If the "selective predicates(s)" is/are self-contained to the subquery use IN e.g.
<pre>
select t1.*
from t1
where t1.x in (select t2.x from t2 where t2.y = 'foo')
</pre>
If the "selective predicates(s)" is/are from the outer table, use EXISTS (refer)e.g.
<pre>
select t1.*
from t1
where exists (select null
from t2
where t2.x=t1.x
)
</pre>
Also note here that an EXISTS does not actually need to return any values for evaluation so you may use select null or select 1 or select * as you prefer. In older variants of some dbs there was a penalty is using *, but this is not the case for most (or all?) dbs today.
Where the relationships between the outer and inner tables are complex use EXISTS (nb: borrowed from this answer)e.g.
<pre>
select t1.*
from t1
where exists (select null
from t2
where t2.x=t1.x
and t2.y>t1.y
and t2.z like '℅' || t1.z || '℅'
)
</pre>
Note that to approach an equivalence of that using IN is so much more cumbersome (and might not be allowed as t1 is referenced 2 tiers down):
<pre>
select t1.*
from t1
where t1.x in (select t2.x
from t2
where t2.x=t1.x
and t2.y > (select t1.y from t1 where t1.x = t2.x)
and t2.z like '%' || (select t1.z from t1 where t1.x = t2.x) || '%'
)</pre>
If seeking the negative of IN i.e. NOT IN be aware that if the subquery returns NULL then the whole result will be NULL (and this can be unexpected) so take steps to exclude NULL from the subquery result> e.g.
<pre>
select t1.*
from t1
where t1.x NOT in (select t2.x from t2 where t2.x IS NOT NULL)
</pre>

nb: the term "selective predicate" is used to mean any predicate(s) that filters out a significant number of rows.

I recommend you also review the many prior answers to this: "Difference between EXISTS and IN in SQL?"

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

关于Mysql查询语句的疑问：EXISTS和IN

问题

答案1

SQL：如何选择每个客户的最后购买的产品？

使用antlr4生成Go代码时遇到了符号类型冲突错误。

插入数据到数据库和executeUpdate()方法中的错误

在SQL中出现语法错误，但我知道语法应该是正确的。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论