关于Mysql查询语句的疑问:EXISTS和IN

huangapple go评论63阅读模式
英文:

Doubts about Mysql query statements: EXISTS and IN

问题

我有三个表:jcxx_device_check_log,sys_depart 和 jcxx_device_info。

jcxx_device_check_log 表包含 640,000 条数据。

sys_depart 表包含 260 条数据。

jcxx_device_info 表包含 840 条数据。

exists SQL 查询的 EXPLAIN 格式为 JSON:

"cost_info": { "query_cost": "453972.80" },

in SQL 查询的 EXPLAIN 格式为 JSON:

"cost_info": { "query_cost": "2025375.58" },

这是我的猜测:

exists SQL 查询的结果集不保存在内存中,只返回 true 和 false。

这是我的问题:

我想知道我的猜测是否正确,以及在遇到大量数据时如何选择语句。

英文:

I have three tables jcxx_device_check_log,sys_depart and jcxx_device_info

the jcxx_device_check_log contains 640000 data volume,

the sys_depart contains 260 data volume

the jcxx_device_info contains 840 data volume

the exists sql

EXPLAIN
format = Json
SELECT state,check_time,note,device_id,COUNT(1) FROM jcxx_device_check_log jc
WHERE  exists (
SELECT jdi.id FROM sys_depart sd
INNER JOIN jcxx_device_info jdi on sd.id=jdi.sys_depart_id
where jc.device_id = jdi.id
) and check_time BETWEEN '2023-4-18 10:00:00' AND ' 2023-4-18 18:00:00'
GROUP BY device_id,state
ORDER BY device_id,state

the EXPLAIN format = Json results

"cost_info": {
"query_cost": "453972.80"
},

the in sql

EXPLAIN
format = Json
SELECT state,check_time,device_id,COUNT(1) FROM jcxx_device_check_log jc
WHERE device_id in (
SELECT jdi.id FROM sys_depart sd
INNER JOIN jcxx_device_info jdi on jdi.sys_depart_id=sd.id
) and check_time BETWEEN '2023-4-18 10:00:00' AND ' 2023-4-18 18:00:00'
GROUP BY device_id,state
ORDER BY device_id,state;

the EXPLAIN format = Json results

"cost_info": {
"query_cost": "2025375.58"
},

this is my guess

the result set of exists sql does not exist in the memory and only returns true and false.

this is my question

I want to know whether my guess is correct and how to choose a statement next time I encounter a large amount of data

答案1

得分: 2

以下是您要翻译的内容:

There is no single simple answer to "how to choose a statement next time" as the conditions of different tables/indices/statistics etc. will all influence the outcome. Furthermore, there is a lot of myth and legend around the topics of IN and EXISTS which generally result in a rule of thumb like this: "[The general rule of thumb is that if the subquery contains a large volume of data, the EXISTS operator provides better performance.]"

However, such rules of thumb are difficult (if not impossible) to prove and yet again may not be true if conditions are suited to use of IN() (e.g. if the in lists an indexed column and the subquery itself is inherently efficient). So be wary of such "rules" and as another prior answer states it:

[The allegedly performance and technical differences between EXISTS and IN may result from specific vendor's implementations/limitations/bugs, but many times they are nothing but myths created due to lack of understanding of the databases internals.]

You are correct about the YES/NO nature of using EXISTS, it merely tests if a match can be found, and if found returns TRUE, otherwise FALSE. However, we cannot know if the subquery used is in memory or not.

For knowing when to choose IN or EXISTS, in truth there is no absolute guide, and all I can do is list out some guidance:

  1. If the "selective predicates(s)" are self-contained in the subquery, use IN, e.g.,

    select t1.* 
    from t1 
    where t1.x in (select t2.x from t2 where t2.y = 'foo')
    
  2. If the "selective predicates(s)" are from the outer table, use EXISTS, e.g.,

    select t1.* 
    from t1 
    where exists (select null from t2 where t2.x=t1.x)
    
  3. Where the relationships between the outer and inner tables are complex, use EXISTS, e.g.,

    select t1.* 
    from t1 
    where exists (select null 
                 from t2 
                 where t2.x=t1.x 
                       and t2.y>t1.y 
                       and t2.z like '%' || t1.z || '%')
    
  4. If seeking the negative of IN i.e. NOT IN, be aware that if the subquery returns NULL, then the whole result will be NULL (and this can be unexpected), so take steps to exclude NULL from the subquery result, e.g.,

    select t1.* 
    from t1 
    where t1.x NOT in (select t2.x from t2 where t2.x IS NOT NULL)
    
  • NB: The term "selective predicate" is used to mean any predicate(s) that filter out a significant number of rows.

I recommend you also review the many prior answers to this: "Difference between EXISTS and IN in SQL."

英文:

There is no single simple answer to "how to choose a statement next time" as the conditions of different tables/indices/statistics etc. will all influence the outcome. Furthermore, there is a lot of myth and legend around the topics of IN and EXISTS which generally result in a rule of thumb like this: "*The general rule of thumb is that if the subquery contains a large volume of data, the EXISTS operator provides a better performance."

However such rules of thumb are difficult (if not impossible) to prove and yet again may not be true if conditions are suited to use of IN() (e.g. if the in lists an indexed column and the subquery itself is inherently efficient). So be wary of such "rules" and as another prior answer states it:

> The allegedly performance and technical differences between EXISTS and IN may result from specific vendor's implementations/limitations/bugs, but many times they are nothing but myths created due to lack of understanding of the databases internals.

You are correct about the YES/NO nature of using EXISTS, it merely tests if a match can be found, and if found returns TRUE, otherwise FALSE. However we cannot know if the subquery used is in memory or not.

For knowing when to choose IN or EXISTS, in truth there is no absolute guide and all I can do is list out some guidance:

  1. If the "selective predicates(s)" is/are self-contained to the subquery use IN e.g.
    <pre>
    select t1.*
    from t1
    where t1.x in (select t2.x from t2 where <b>t2.y = 'foo'</b>)
    </pre>

  2. If the "selective predicates(s)" is/are from the outer table, use EXISTS (refer)e.g.
    <pre>
    select t1.*
    from t1
    where exists (select null
    from t2
    where <b>t2.x=t1.x</b>
    )
    </pre>
    Also note here that an EXISTS does not actually need to return any values for evaluation so you may use select null or select 1 or select * as you prefer. In older variants of some dbs there was a penalty is using *, but this is not the case for most (or all?) dbs today.

  3. Where the relationships between the outer and inner tables are complex use EXISTS (nb: borrowed from this answer)e.g.
    <pre>
    select t1.*
    from t1
    where exists (select null
    from t2
    where <b>t2.x=t1.x
    and t2.y>t1.y
    and t2.z like '℅' || t1.z || '℅'</b>
    )
    </pre>
    Note that to approach an equivalence of that using IN is so much more cumbersome (and might not be allowed as t1 is referenced 2 tiers down):
    <pre>
    select t1.*
    from t1
    where t1.x in (select t2.x
    from t2
    where <b>t2.x=t1.x
    and t2.y > (select t1.y from t1 where t1.x = t2.x)
    and t2.z like '%' || (select t1.z from t1 where t1.x = t2.x) || '%'</b>
    )</pre>

  4. If seeking the negative of IN i.e. NOT IN be aware that if the subquery returns NULL then the whole result will be NULL (and this can be unexpected) so take steps to exclude NULL from the subquery result> e.g.
    <pre>
    select t1.*
    from t1
    where t1.x <b>NOT</b> in (select t2.x from t2 where <b>t2.x IS NOT NULL</b>)
    </pre>

  • nb: the term "selective predicate" is used to mean any predicate(s) that filters out a significant number of rows.

I recommend you also review the many prior answers to this: "Difference between EXISTS and IN in SQL?"

huangapple
  • 本文由 发表于 2023年4月19日 14:42:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/76051448.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定