英文:
Confused about mysql compsite index and leftmost prefix matching
问题
我的MySQL版本是5.7,我创建了一个包含1,332,660条记录的测试表格:
CREATE TABLE `test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`data_name` varchar(500) DEFAULT NULL,
`data_time` varchar(100) DEFAULT NULL,
`data_value` decimal(50,8) DEFAULT NULL,
`data_code` varchar(100) DEFAULT NULL,
`create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `idx_name_time_value` (`data_name`,`data_time`,`data_value`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
其中有一个由data_name
、data_time
和data_value
构成的索引。
我有三个SQL查询以及它们的结果:
- 解释说明第一个SQL查询,其中条件是:data_name = 'abc',data_time = '2022-06-15 00:00:00',data_value = 75.1。
- 解释说明第二个SQL查询,其中条件是:data_time = '2022-06-15 00:00:00',data_value = 75.1。
- 解释说明第三个SQL查询,其中条件是:data_time = '2022-06-15 00:00:00',data_value = 75.1。
根据“最左前缀匹配”,第二个SQL查询没有使用索引,但第三个SQL查询实际上使用了索引。
我的问题是为什么第三个SQL查询可以使用索引,即使在使用索引时仍然发生了全表扫描(第二个查询的Explain结果中的rows
等于第三个查询的rows
)。
有人可以帮助我吗?非常感谢!!!!
英文:
My MySQL version is 5.7 and I created a test table with 1,332,660 records:
CREATE TABLE `test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`data_name` varchar(500) DEFAULT NULL,
`data_time` varchar(100) DEFAULT NULL,
`data_value` decimal(50,8) DEFAULT NULL,
`data_code` varchar(100) DEFAULT NULL,
`create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `idx_name_time_value` (`data_name`,`data_time`,`data_value`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
there is an index bydata_name
,data_time
,data_value
.
I have three SQL and there results :
explain select * from test where data_name ='abc' and data_time = '2022-06-15 00:00:00' and data_value=75.1
explain select * from test where data_time = '2022-06-15 00:00:00' and data_value=75.1
explain select data_name from test where data_time = '2022-06-15 00:00:00' and data_value=75.1
According to the leftmost prefix matching
the second sql does not use index, but the third one actually used an index.
My question is why can the thired sql use index, and why does a full table scan occur even when indexes are used (The seconde one's Explain result rows
is equal to the thired one's rows
).
Can someone help me ? Thans a lot!!!!!!
答案1
得分: 1
查询3 - 为什么它使用索引但很慢...
select data_name
from test
where data_time = '2022-06-15 00:00:00'
and data_value=75.1
需要3列,而且它们都在一个单独的索引中。因此,使用该索引可能比扫描表格快。这被称为"覆盖索引"。在EXPLAIN
中以Using index
标示出来。"最左前缀"不相关;"覆盖"是。
然而,它将不得不扫描整个索引,如 "Rows" 指示的那样,大约是表中的行数。
你的日期时间字符串看起来像是MySQL的日期时间;最好将列声明为 DATETIME
而不是 VARCHAR
。
查询1可能会很快,因为你在索引中测试了所有3列。
查询2可能会很慢,因为WHERE
中的列不是最左前缀。
select *
from test
where data_time = '2022-06-15 00:00:00'
and data_value=75.1
以下两者都会有好处:
INDEX(data_time, data_value)
INDEX(data_value, data_time)
查询4:
where data_name ='abc'
and data_time = '2022-06-15 00:00:00'
and data_value >= 75
and data_value < 76
这将很快,因为"最左前缀"允许使用索引。
查询5:
where data_name ='abc'
and data_time >= '2022-06-15'
and data_value = 75.1
这将只会有些快,因为"最左前缀"将在 data_time
处停止。也就是说,在第一个"范围"测试之后停止了。添加以下内容将使查询5变快并充分使用索引:
INDEX(data_name, data_value, -- 用 "=" 进行测试
data_time) -- 用 "范围" 进行测试
请注意,索引中列的顺序很重要;WHERE
中测试的顺序并不重要。
英文:
Query 3 -- Why it uses the index but is slow...
select data_name
from test
where data_time = '2022-06-15 00:00:00'
and data_value=75.1
needs 3 columns, and all are in a single index. So using that index is probably faster than scanning the table. This is called a "covering index". It is indicated in EXPLAIN
with Using index
. "Leftmost" is not relevant; "covering" is.
Still, it will have to scan the entire index, as indicated by "Rows" being approximately the number of rows in the table.
Your datetime string looks like MySQL's datetime; it would be better to declare the column DATETIME
instead of VARCHAR
.
Query 1 is likely to be fast since you are testing all 3 columns of the index with =
.
Query 2 is likely to be very slow since the columns in the WHERE
are not "leftmost".
select *
from test
where data_time = '2022-06-15 00:00:00'
and data_value=75.1
would benefit either of these:
INDEX(data_time, data_value)
INDEX(data_value, data_time)
Query 4:
where data_name ='abc'
and data_time = '2022-06-15 00:00:00'
and data_value >= 75
and data_value < 76
This will be fast because "leftmost" lets the index be used.
Query 5:
where data_name ='abc'
and data_time >= '2022-06-15'
and data_value = 75.1
This will be only somewhat fast because "leftmost" will stop with the data_time
. That is, it stopped after the first "range" test. Adding this would make query 5 fast and fully use the index:
INDEX(data_name, data_value, -- tested with "="
data_time) -- tested with "range"
Note that the order of columns in the index important; the order of tests in WHERE
is not.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论