如果复合索引中列的顺序更改,LIKE 语句的性能会大幅下降。

huangapple go评论56阅读模式
英文:

Huge performance drop for LIKE statement if column order changes in composite index

问题

编辑:
在使用src_cus_lang时,解释了前两个查询的结果:
如果复合索引中列的顺序更改,LIKE 语句的性能会大幅下降。
在使用cust_lang_src时,解释了最后两个查询的结果:
如果复合索引中列的顺序更改,LIKE 语句的性能会大幅下降。

原帖:
我正在使用MySQL,有如下表格:

CREATE TABLE tm(
    id INT AUTO_INCREMENT PRIMARY KEY,
    source TEXT(7000) NOT NULL,
    target TEXT(6000) NOT NULL,
    language CHAR(3),
    customer VARCHAR(10))

当使用复合索引(source(755), customer, language)时,提取时间如下:

SELECT source FROM tm WHERE customer = 'Customer A' AND language = 'ENU' AND source LIKE '%.net%';	-- 0.015秒
SELECT source FROM tm WHERE source LIKE '%.net%' AND customer = 'Customer A' AND language = 'ENU';	-- 0.015秒

然而,当使用复合索引(customer, language, source(755))时,提取时间急剧增加:

SELECT source FROM tm WHERE customer = 'Customer A' AND language = 'ENU' AND source LIKE '%.net%';	-- 1.7秒
SELECT source FROM tm WHERE source LIKE '%.net%' AND customer = 'Customer A' AND language = 'ENU';	-- 1.7秒

这与我在互联网上找到的某处的信息相矛盾,该信息声称左前缀规则一直适用,直到遇到<>BETWEENLIKE为止。根据这个说法,(customer, language, source(755))索引在提取时间方面会比(source(755), customer, language)索引更快,因为sourceLIKE一起使用。我无法发布源代码,因为它不是英语。这个说法是否错误?如果不是,为什么我看到相反的结果?

英文:

Edit:
EXPLAIN result for the first two queries when using src_cus_lang:
如果复合索引中列的顺序更改,LIKE 语句的性能会大幅下降。
EXPLAIN result for the last two queries when using cust_lang_src:
如果复合索引中列的顺序更改,LIKE 语句的性能会大幅下降。

Original post:
I'm using MySQL and have a table as follows:

CREATE TABLE tm(
    id INT AUTO_INCREMENT PRIMARY KEY,
    source TEXT(7000) NOT NULL,
    target TEXT(6000) NOT NULL,
    language CHAR(3),
    customer VARCHAR(10))

When using composite index (source(755), customer, language), the fetch times are as follows:

SELECT source FROM tm WHERE customer = &#39;Customer A&#39; AND language = &#39;ENU&#39; AND source LIKE &#39;%.net%&#39;;	-- 0.015s
SELECT source FROM tm WHERE source LIKE &#39;%.net%&#39; AND customer = &#39;Customer A&#39; AND language = &#39;ENU&#39;;	-- 0.015s

However, when using composite index (customer, language, source(755)), the fetch times increased drastically:

SELECT source FROM tm WHERE customer = &#39;Customer A&#39; AND language = &#39;ENU&#39; AND source LIKE &#39;%.net%&#39;;	-- 1.7s
SELECT source FROM tm WHERE source LIKE &#39;%.net%&#39; AND customer = &#39;Customer A&#39; AND language = &#39;ENU&#39;;	-- 1.7s

This is contrary to what I found somewhere on the Internet, which claims that the leftmost prefix rule is applied until &lt;, &gt;, BETWEEN or LIKE is met. According to this claim, the (customer, language, source(755)) index would be faster than (source(755), customer, language) index regarding fetch time, since source is used with LIKE. I can't post the source, because it's not in English. Is this claim wrong? If not, why did I see contrary results?

答案1

得分: 2

Your condition source LIKE &#39;%.net%&#39; cannot use an index. The wildcard at the start of your pattern makes the index useless because the index entries are sorted from the start of the string.

类似地,可以将电话簿作为类比。电话簿是按姓名排序的。你能够高效地找到姓名中间包含 "net" 的条目吗?不能,因为电话簿的排序顺序对此毫无帮助。

由于这个条件无法使用索引,(source(755), customer, language) 上的索引无法使用。它无法使用第一列,因此也无法使用后续列。

(customer, language, source(755)) 上的索引可以使用,但仅适用于前两列。LIKE 条件仍然忽略了索引,但至少可以使用前两列缩小搜索范围。

电话簿的比喻在这里也有帮助。如果你搜索姓 "Smith" 且名字中包含 "net" 的人,那么至少你可以将搜索限制在书中的一部分,即姓 "Smith" 与你在姓氏上设定的条件相匹配的人。但名字仍然无法被优化;你只能逐个读取所有姓 "Smith" 的条目并逐一评估它们。

英文:

Your condition source LIKE &#39;%.net%&#39; cannot use an index. The wildcard at the start of your pattern makes the index useless, because the index entries are sorted from the start of the string.

By analogy, think of a telephone book. It's sorted by names. Can you find entries with "net" in the middle of the name efficiently? No, because the sort order of the book doesn't help at all.

Since that condition cannot use an index, the index on (source(755), customer, language) cannot be used. It can't use the first column, so it can't use any subsequent columns.

The index on (customer, language, source(755)) can be used, but only for the first two columns. The LIKE condition still ignores the index, but at least the search can be narrowed down using the first two columns.

The telephone book analogy may help here too. If you search for someone with last name 'Smith' and whose first name contains "net" anywhere within the name, then at least you can limit your search to a subset of the book, people whose last name matches your condition on last name. But the first name still cannot be optimized; you just have to read through all the Smiths and evaluate them one by one.

答案2

得分: 1

这可能是最好的:

PRIMARY KEY(customer, language, id),  -- 以优化查询(可能还有其他)
INDEX(id)   -- 保持AUTO_INCREMENT正常运作

可能你的前两次尝试之所以“快”,是因为它们忽略了索引。使用 EXPLAIN SELECT ... 进行验证。

英文:

(In addition to what Bill says...)

This may be the best:

PRIMARY KEY(customer, language, id),  -- to optimize the query (and maybe others)
INDEX(id)   -- to keep AUTO_INCREMENT happy

Probably your first two tries were "fast" because they ignored the index. Use EXPLAIN SELECT ... to verify.

huangapple
  • 本文由 发表于 2023年2月26日 20:42:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/75572046.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定