关于MySQL复合索引和左侧前缀匹配的困惑

huangapple go评论69阅读模式
英文:

Confused about mysql compsite index and leftmost prefix matching

问题

我的MySQL版本是5.7,我创建了一个包含1,332,660条记录的测试表格:

CREATE TABLE `test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`data_name` varchar(500) DEFAULT NULL,
`data_time` varchar(100) DEFAULT NULL,
`data_value` decimal(50,8) DEFAULT NULL,
`data_code` varchar(100) DEFAULT NULL,
`create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `idx_name_time_value` (`data_name`,`data_time`,`data_value`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

其中有一个由data_namedata_timedata_value构成的索引。

我有三个SQL查询以及它们的结果:

  1. 解释说明第一个SQL查询,其中条件是:data_name = 'abc',data_time = '2022-06-15 00:00:00',data_value = 75.1。
  2. 解释说明第二个SQL查询,其中条件是:data_time = '2022-06-15 00:00:00',data_value = 75.1。
  3. 解释说明第三个SQL查询,其中条件是:data_time = '2022-06-15 00:00:00',data_value = 75.1。

根据“最左前缀匹配”,第二个SQL查询没有使用索引,但第三个SQL查询实际上使用了索引。

我的问题是为什么第三个SQL查询可以使用索引,即使在使用索引时仍然发生了全表扫描(第二个查询的Explain结果中的rows等于第三个查询的rows)。

有人可以帮助我吗?非常感谢!!!!

英文:

My MySQL version is 5.7 and I created a test table with 1,332,660 records:

CREATE TABLE `test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`data_name` varchar(500) DEFAULT NULL,
`data_time` varchar(100) DEFAULT NULL,
`data_value` decimal(50,8) DEFAULT NULL,
`data_code` varchar(100) DEFAULT NULL,
`create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `idx_name_time_value` (`data_name`,`data_time`,`data_value`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

there is an index bydata_name,data_time,data_value.
I have three SQL and there results :

explain select  *  from test    where data_name ='abc' and  data_time = '2022-06-15 00:00:00' and data_value=75.1

关于MySQL复合索引和左侧前缀匹配的困惑

explain select  *  from test    where   data_time = '2022-06-15 00:00:00' and data_value=75.1

关于MySQL复合索引和左侧前缀匹配的困惑

explain select  data_name  from test    where   data_time = '2022-06-15 00:00:00' and data_value=75.1

关于MySQL复合索引和左侧前缀匹配的困惑

According to the leftmost prefix matching the second sql does not use index, but the third one actually used an index.

My question is why can the thired sql use index, and why does a full table scan occur even when indexes are used (The seconde one's Explain result rows is equal to the thired one's rows).

Can someone help me ? Thans a lot!!!!!!

答案1

得分: 1

查询3 - 为什么它使用索引但很慢...

select  data_name
    from  test
    where  data_time = '2022-06-15 00:00:00'
      and  data_value=75.1

需要3列,而且它们都在一个单独的索引中。因此,使用该索引可能比扫描表格快。这被称为"覆盖索引"。在EXPLAIN中以Using index标示出来。"最左前缀"不相关;"覆盖"是。

然而,它将不得不扫描整个索引,如 "Rows" 指示的那样,大约是表中的行数。

你的日期时间字符串看起来像是MySQL的日期时间;最好将列声明为 DATETIME 而不是 VARCHAR

查询1可能会很快,因为你在索引中测试了所有3列。

查询2可能会很慢,因为WHERE中的列不是最左前缀。

select  *
    from  test
    where  data_time = '2022-06-15 00:00:00'
      and  data_value=75.1

以下两者都会有好处:

INDEX(data_time, data_value)
INDEX(data_value, data_time)

查询4:

where  data_name ='abc'
  and  data_time = '2022-06-15 00:00:00'
  and  data_value >= 75
  and  data_value <  76

这将很快,因为"最左前缀"允许使用索引。

查询5:

where  data_name ='abc'
  and  data_time >= '2022-06-15'
  and  data_value = 75.1

这将只会有些快,因为"最左前缀"将在 data_time 处停止。也就是说,在第一个"范围"测试之后停止了。添加以下内容将使查询5变快并充分使用索引:

INDEX(data_name, data_value,   -- 用 "=" 进行测试
      data_time)               -- 用 "范围" 进行测试

请注意,索引中列的顺序很重要;WHERE 中测试的顺序并不重要。

英文:

Query 3 -- Why it uses the index but is slow...

select  data_name
    from  test
    where  data_time = &#39;2022-06-15 00:00:00&#39;
      and  data_value=75.1

needs 3 columns, and all are in a single index. So using that index is probably faster than scanning the table. This is called a "covering index". It is indicated in EXPLAIN with Using index. "Leftmost" is not relevant; "covering" is.

Still, it will have to scan the entire index, as indicated by "Rows" being approximately the number of rows in the table.

Your datetime string looks like MySQL's datetime; it would be better to declare the column DATETIME instead of VARCHAR.

Query 1 is likely to be fast since you are testing all 3 columns of the index with =.

Query 2 is likely to be very slow since the columns in the WHERE are not "leftmost".

select  *
    from  test
    where  data_time = &#39;2022-06-15 00:00:00&#39;
      and  data_value=75.1

would benefit either of these:

INDEX(data_time, data_value)
INDEX(data_value, data_time)

Query 4:

    where  data_name =&#39;abc&#39;
      and  data_time = &#39;2022-06-15 00:00:00&#39;
      and  data_value &gt;= 75
      and  data_value &lt;  76

This will be fast because "leftmost" lets the index be used.

Query 5:

    where  data_name =&#39;abc&#39;
      and  data_time &gt;= &#39;2022-06-15&#39;
      and  data_value = 75.1

This will be only somewhat fast because "leftmost" will stop with the data_time. That is, it stopped after the first "range" test. Adding this would make query 5 fast and fully use the index:

   INDEX(data_name, data_value,   -- tested with &quot;=&quot;
         data_time)               -- tested with &quot;range&quot;

Note that the order of columns in the index important; the order of tests in WHERE is not.

huangapple
  • 本文由 发表于 2023年7月20日 16:42:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/76728115.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定