2023年7月20日 16:42:16go评论97阅读模式

英文:

Confused about mysql compsite index and leftmost prefix matching

问题

我的MySQL版本是5.7，我创建了一个包含1,332,660条记录的测试表格：

CREATE TABLE `test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`data_name` varchar(500) DEFAULT NULL,
`data_time` varchar(100) DEFAULT NULL,
`data_value` decimal(50,8) DEFAULT NULL,
`data_code` varchar(100) DEFAULT NULL,
`create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `idx_name_time_value` (`data_name`,`data_time`,`data_value`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

其中有一个由data_name、data_time和data_value构成的索引。

我有三个SQL查询以及它们的结果：

解释说明第一个SQL查询，其中条件是：data_name = 'abc'，data_time = '2022-06-15 00:00:00'，data_value = 75.1。
解释说明第二个SQL查询，其中条件是：data_time = '2022-06-15 00:00:00'，data_value = 75.1。
解释说明第三个SQL查询，其中条件是：data_time = '2022-06-15 00:00:00'，data_value = 75.1。

根据“最左前缀匹配”，第二个SQL查询没有使用索引，但第三个SQL查询实际上使用了索引。

我的问题是为什么第三个SQL查询可以使用索引，即使在使用索引时仍然发生了全表扫描（第二个查询的Explain结果中的rows等于第三个查询的rows）。

有人可以帮助我吗？非常感谢！！！！

英文:

My MySQL version is 5.7 and I created a test table with 1,332,660 records:

CREATE TABLE `test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`data_name` varchar(500) DEFAULT NULL,
`data_time` varchar(100) DEFAULT NULL,
`data_value` decimal(50,8) DEFAULT NULL,
`data_code` varchar(100) DEFAULT NULL,
`create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `idx_name_time_value` (`data_name`,`data_time`,`data_value`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

there is an index bydata_name,data_time,data_value.
I have three SQL and there results :

explain select  *  from test    where data_name =&#39;abc&#39; and  data_time = &#39;2022-06-15 00:00:00&#39; and data_value=75.1

explain select  *  from test    where   data_time = &#39;2022-06-15 00:00:00&#39; and data_value=75.1

explain select  data_name  from test    where   data_time = &#39;2022-06-15 00:00:00&#39; and data_value=75.1

According to the leftmost prefix matching the second sql does not use index, but the third one actually used an index.

My question is why can the thired sql use index, and why does a full table scan occur even when indexes are used (The seconde one's Explain result rows is equal to the thired one's rows).

Can someone help me ? Thans a lot!!!!!!

答案1

得分: 1

查询3 - 为什么它使用索引但很慢...

select  data_name
    from  test
    where  data_time = '2022-06-15 00:00:00'
      and  data_value=75.1

需要3列，而且它们都在一个单独的索引中。因此，使用该索引可能比扫描表格快。这被称为"覆盖索引"。在EXPLAIN中以Using index标示出来。"最左前缀"不相关；"覆盖"是。

然而，它将不得不扫描整个索引，如 "Rows" 指示的那样，大约是表中的行数。

你的日期时间字符串看起来像是MySQL的日期时间；最好将列声明为 DATETIME 而不是 VARCHAR。

查询1可能会很快，因为你在索引中测试了所有3列。

查询2可能会很慢，因为WHERE中的列不是最左前缀。

select  *
    from  test
    where  data_time = '2022-06-15 00:00:00'
      and  data_value=75.1

以下两者都会有好处：

INDEX(data_time, data_value)
INDEX(data_value, data_time)

查询4:

where  data_name ='abc'
  and  data_time = '2022-06-15 00:00:00'
  and  data_value >= 75
  and  data_value <  76

这将很快，因为"最左前缀"允许使用索引。

查询5:

where  data_name ='abc'
  and  data_time >= '2022-06-15'
  and  data_value = 75.1

这将只会有些快，因为"最左前缀"将在 data_time 处停止。也就是说，在第一个"范围"测试之后停止了。添加以下内容将使查询5变快并充分使用索引：

INDEX(data_name, data_value,   -- 用 "=" 进行测试
      data_time)               -- 用 "范围" 进行测试

请注意，索引中列的顺序很重要；WHERE 中测试的顺序并不重要。

英文:

Query 3 -- Why it uses the index but is slow...

select  data_name
    from  test
    where  data_time = &#39;2022-06-15 00:00:00&#39;
      and  data_value=75.1

needs 3 columns, and all are in a single index. So using that index is probably faster than scanning the table. This is called a "covering index". It is indicated in EXPLAIN with Using index. "Leftmost" is not relevant; "covering" is.

Still, it will have to scan the entire index, as indicated by "Rows" being approximately the number of rows in the table.

Your datetime string looks like MySQL's datetime; it would be better to declare the column DATETIME instead of VARCHAR.

Query 1 is likely to be fast since you are testing all 3 columns of the index with =.

Query 2 is likely to be very slow since the columns in the WHERE are not "leftmost".

select  *
    from  test
    where  data_time = &#39;2022-06-15 00:00:00&#39;
      and  data_value=75.1

would benefit either of these:

INDEX(data_time, data_value)
INDEX(data_value, data_time)

Query 4:

    where  data_name =&#39;abc&#39;
      and  data_time = &#39;2022-06-15 00:00:00&#39;
      and  data_value &gt;= 75
      and  data_value &lt;  76

This will be fast because "leftmost" lets the index be used.

Query 5:

    where  data_name =&#39;abc&#39;
      and  data_time &gt;= &#39;2022-06-15&#39;
      and  data_value = 75.1

This will be only somewhat fast because "leftmost" will stop with the data_time. That is, it stopped after the first "range" test. Adding this would make query 5 fast and fully use the index:

   INDEX(data_name, data_value,   -- tested with &quot;=&quot;
         data_time)               -- tested with &quot;range&quot;

Note that the order of columns in the index important; the order of tests in WHERE is not.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

关于MySQL复合索引和左侧前缀匹配的困惑

问题

答案1

无法在编写MySQL查询时从多个表中获取精确的数据集。

将 SQL 查询结果集存储在嵌套结构中，并在为空时省略。

java.sql.PreparedStatement无法转换为com.mysql.jdbc.PreparedStatement。

如何按照存储在另一个表中的 JSON 格式的 id 列表来选择行？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。