2023年2月26日 20:42:03go评论72阅读模式

英文:

Huge performance drop for LIKE statement if column order changes in composite index

问题

编辑：
在使用src_cus_lang时，解释了前两个查询的结果：

在使用cust_lang_src时，解释了最后两个查询的结果：

原帖：
我正在使用MySQL，有如下表格：

CREATE TABLE tm(
    id INT AUTO_INCREMENT PRIMARY KEY,
    source TEXT(7000) NOT NULL,
    target TEXT(6000) NOT NULL,
    language CHAR(3),
    customer VARCHAR(10))

当使用复合索引(source(755), customer, language)时，提取时间如下：

SELECT source FROM tm WHERE customer = 'Customer A' AND language = 'ENU' AND source LIKE '%.net%';	-- 0.015秒
SELECT source FROM tm WHERE source LIKE '%.net%' AND customer = 'Customer A' AND language = 'ENU';	-- 0.015秒

然而，当使用复合索引(customer, language, source(755))时，提取时间急剧增加：

SELECT source FROM tm WHERE customer = 'Customer A' AND language = 'ENU' AND source LIKE '%.net%';	-- 1.7秒
SELECT source FROM tm WHERE source LIKE '%.net%' AND customer = 'Customer A' AND language = 'ENU';	-- 1.7秒

这与我在互联网上找到的某处的信息相矛盾，该信息声称左前缀规则一直适用，直到遇到<、>、BETWEEN或LIKE为止。根据这个说法，(customer, language, source(755))索引在提取时间方面会比(source(755), customer, language)索引更快，因为source与LIKE一起使用。我无法发布源代码，因为它不是英语。这个说法是否错误？如果不是，为什么我看到相反的结果？

英文:

Edit:
EXPLAIN result for the first two queries when using src_cus_lang:

EXPLAIN result for the last two queries when using cust_lang_src:

Original post:
I'm using MySQL and have a table as follows:

CREATE TABLE tm(
    id INT AUTO_INCREMENT PRIMARY KEY,
    source TEXT(7000) NOT NULL,
    target TEXT(6000) NOT NULL,
    language CHAR(3),
    customer VARCHAR(10))

When using composite index (source(755), customer, language), the fetch times are as follows:

SELECT source FROM tm WHERE customer = &#39;Customer A&#39; AND language = &#39;ENU&#39; AND source LIKE &#39;%.net%&#39;;	-- 0.015s
SELECT source FROM tm WHERE source LIKE &#39;%.net%&#39; AND customer = &#39;Customer A&#39; AND language = &#39;ENU&#39;;	-- 0.015s

However, when using composite index (customer, language, source(755)), the fetch times increased drastically:

SELECT source FROM tm WHERE customer = &#39;Customer A&#39; AND language = &#39;ENU&#39; AND source LIKE &#39;%.net%&#39;;	-- 1.7s
SELECT source FROM tm WHERE source LIKE &#39;%.net%&#39; AND customer = &#39;Customer A&#39; AND language = &#39;ENU&#39;;	-- 1.7s

This is contrary to what I found somewhere on the Internet, which claims that the leftmost prefix rule is applied until <, >, BETWEEN or LIKE is met. According to this claim, the (customer, language, source(755)) index would be faster than (source(755), customer, language) index regarding fetch time, since source is used with LIKE. I can't post the source, because it's not in English. Is this claim wrong? If not, why did I see contrary results?

答案1

得分: 2

Your condition source LIKE '%.net%' cannot use an index. The wildcard at the start of your pattern makes the index useless because the index entries are sorted from the start of the string.

类似地，可以将电话簿作为类比。电话簿是按姓名排序的。你能够高效地找到姓名中间包含 "net" 的条目吗？不能，因为电话簿的排序顺序对此毫无帮助。

由于这个条件无法使用索引，(source(755), customer, language) 上的索引无法使用。它无法使用第一列，因此也无法使用后续列。

(customer, language, source(755)) 上的索引可以使用，但仅适用于前两列。LIKE 条件仍然忽略了索引，但至少可以使用前两列缩小搜索范围。

电话簿的比喻在这里也有帮助。如果你搜索姓 "Smith" 且名字中包含 "net" 的人，那么至少你可以将搜索限制在书中的一部分，即姓 "Smith" 与你在姓氏上设定的条件相匹配的人。但名字仍然无法被优化；你只能逐个读取所有姓 "Smith" 的条目并逐一评估它们。

英文:

Your condition source LIKE '%.net%' cannot use an index. The wildcard at the start of your pattern makes the index useless, because the index entries are sorted from the start of the string.

By analogy, think of a telephone book. It's sorted by names. Can you find entries with "net" in the middle of the name efficiently? No, because the sort order of the book doesn't help at all.

Since that condition cannot use an index, the index on (source(755), customer, language) cannot be used. It can't use the first column, so it can't use any subsequent columns.

The index on (customer, language, source(755)) can be used, but only for the first two columns. The LIKE condition still ignores the index, but at least the search can be narrowed down using the first two columns.

The telephone book analogy may help here too. If you search for someone with last name 'Smith' and whose first name contains "net" anywhere within the name, then at least you can limit your search to a subset of the book, people whose last name matches your condition on last name. But the first name still cannot be optimized; you just have to read through all the Smiths and evaluate them one by one.

答案2

得分: 1

这可能是最好的：

PRIMARY KEY(customer, language, id),  -- 以优化查询（可能还有其他）
INDEX(id)   -- 保持AUTO_INCREMENT正常运作

可能你的前两次尝试之所以“快”，是因为它们忽略了索引。使用 EXPLAIN SELECT ... 进行验证。

英文:

(In addition to what Bill says...)

This may be the best:

PRIMARY KEY(customer, language, id),  -- to optimize the query (and maybe others)
INDEX(id)   -- to keep AUTO_INCREMENT happy

Probably your first two tries were "fast" because they ignored the index. Use EXPLAIN SELECT ... to verify.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如果复合索引中列的顺序更改，LIKE 语句的性能会大幅下降。

问题

答案1

答案2

有没有办法在MySQL中连接子查询时停止反复编写相同的子查询？

如何在Python中高效地对大量点应用3D变换？

What is more efficient? Initializing an array with an element or pushing an element to an empty array?

有没有一种更快的方法来执行非等值连接并在R中找到连接值的最大值？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论