英文:
Huge performance drop for LIKE statement if column order changes in composite index
问题
编辑:
在使用src_cus_lang
时,解释了前两个查询的结果:
在使用cust_lang_src
时,解释了最后两个查询的结果:
原帖:
我正在使用MySQL,有如下表格:
CREATE TABLE tm(
id INT AUTO_INCREMENT PRIMARY KEY,
source TEXT(7000) NOT NULL,
target TEXT(6000) NOT NULL,
language CHAR(3),
customer VARCHAR(10))
当使用复合索引(source(755), customer, language)
时,提取时间如下:
SELECT source FROM tm WHERE customer = 'Customer A' AND language = 'ENU' AND source LIKE '%.net%'; -- 0.015秒
SELECT source FROM tm WHERE source LIKE '%.net%' AND customer = 'Customer A' AND language = 'ENU'; -- 0.015秒
然而,当使用复合索引(customer, language, source(755))
时,提取时间急剧增加:
SELECT source FROM tm WHERE customer = 'Customer A' AND language = 'ENU' AND source LIKE '%.net%'; -- 1.7秒
SELECT source FROM tm WHERE source LIKE '%.net%' AND customer = 'Customer A' AND language = 'ENU'; -- 1.7秒
这与我在互联网上找到的某处的信息相矛盾,该信息声称左前缀规则一直适用,直到遇到<
、>
、BETWEEN
或LIKE
为止。根据这个说法,(customer, language, source(755))
索引在提取时间方面会比(source(755), customer, language)
索引更快,因为source
与LIKE
一起使用。我无法发布源代码,因为它不是英语。这个说法是否错误?如果不是,为什么我看到相反的结果?
英文:
Edit:
EXPLAIN result for the first two queries when using src_cus_lang:
EXPLAIN result for the last two queries when using cust_lang_src:
Original post:
I'm using MySQL and have a table as follows:
CREATE TABLE tm(
id INT AUTO_INCREMENT PRIMARY KEY,
source TEXT(7000) NOT NULL,
target TEXT(6000) NOT NULL,
language CHAR(3),
customer VARCHAR(10))
When using composite index (source(755), customer, language)
, the fetch times are as follows:
SELECT source FROM tm WHERE customer = 'Customer A' AND language = 'ENU' AND source LIKE '%.net%'; -- 0.015s
SELECT source FROM tm WHERE source LIKE '%.net%' AND customer = 'Customer A' AND language = 'ENU'; -- 0.015s
However, when using composite index (customer, language, source(755))
, the fetch times increased drastically:
SELECT source FROM tm WHERE customer = 'Customer A' AND language = 'ENU' AND source LIKE '%.net%'; -- 1.7s
SELECT source FROM tm WHERE source LIKE '%.net%' AND customer = 'Customer A' AND language = 'ENU'; -- 1.7s
This is contrary to what I found somewhere on the Internet, which claims that the leftmost prefix rule is applied until <
, >
, BETWEEN
or LIKE
is met. According to this claim, the (customer, language, source(755))
index would be faster than (source(755), customer, language)
index regarding fetch time, since source
is used with LIKE
. I can't post the source, because it's not in English. Is this claim wrong? If not, why did I see contrary results?
答案1
得分: 2
Your condition source LIKE '%.net%'
cannot use an index. The wildcard at the start of your pattern makes the index useless because the index entries are sorted from the start of the string.
类似地,可以将电话簿作为类比。电话簿是按姓名排序的。你能够高效地找到姓名中间包含 "net" 的条目吗?不能,因为电话簿的排序顺序对此毫无帮助。
由于这个条件无法使用索引,(source(755), customer, language)
上的索引无法使用。它无法使用第一列,因此也无法使用后续列。
(customer, language, source(755))
上的索引可以使用,但仅适用于前两列。LIKE
条件仍然忽略了索引,但至少可以使用前两列缩小搜索范围。
电话簿的比喻在这里也有帮助。如果你搜索姓 "Smith" 且名字中包含 "net" 的人,那么至少你可以将搜索限制在书中的一部分,即姓 "Smith" 与你在姓氏上设定的条件相匹配的人。但名字仍然无法被优化;你只能逐个读取所有姓 "Smith" 的条目并逐一评估它们。
英文:
Your condition source LIKE '%.net%'
cannot use an index. The wildcard at the start of your pattern makes the index useless, because the index entries are sorted from the start of the string.
By analogy, think of a telephone book. It's sorted by names. Can you find entries with "net" in the middle of the name efficiently? No, because the sort order of the book doesn't help at all.
Since that condition cannot use an index, the index on (source(755), customer, language)
cannot be used. It can't use the first column, so it can't use any subsequent columns.
The index on (customer, language, source(755))
can be used, but only for the first two columns. The LIKE
condition still ignores the index, but at least the search can be narrowed down using the first two columns.
The telephone book analogy may help here too. If you search for someone with last name 'Smith' and whose first name contains "net" anywhere within the name, then at least you can limit your search to a subset of the book, people whose last name matches your condition on last name. But the first name still cannot be optimized; you just have to read through all the Smiths and evaluate them one by one.
答案2
得分: 1
这可能是最好的:
PRIMARY KEY(customer, language, id), -- 以优化查询(可能还有其他)
INDEX(id) -- 保持AUTO_INCREMENT正常运作
可能你的前两次尝试之所以“快”,是因为它们忽略了索引。使用 EXPLAIN SELECT ...
进行验证。
英文:
(In addition to what Bill says...)
This may be the best:
PRIMARY KEY(customer, language, id), -- to optimize the query (and maybe others)
INDEX(id) -- to keep AUTO_INCREMENT happy
Probably your first two tries were "fast" because they ignored the index. Use EXPLAIN SELECT ...
to verify.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论