MariaDB 10.3 快速,10.5 慢

huangapple go评论61阅读模式
英文:

MariaDB 10.3 Fast, 10.5 Slow

问题

I have a MariaDB 10.3 Galera cluster with a DB that I'm migrating to newer hardware and upgrading to 10.5. When I mysqldump the database from the old server and import it into the new, any JOIN is god-awful slow. Here's an example query:

SELECT *
FROM claim_notes
INNER JOIN claims
ON claim_notes.claim_id = claims.id
INNER JOIN users
ON claim_notes.user_id = users.id
WHERE claim_notes.email_queue = 1

This runs in 0.7 seconds on the old cluster and 3 minutes and 51 seconds on the new cluster. Yes, it's a stupid query, but it demonstrates the point.

Given that the database schema (including all indexes) along with the entire data set is identical across both setups, I can rule out poorly-optimized queries, or something like the optimizer treating them differently because of different data sets.

However, the optimizer does seem to treat them very differently, according to EXPLAIN. Old setup:

+------+-------------+-------------+--------+------------------------------------------------------+-------------+---------+---------------------------------------+------+-------+
| id   | select_type | table       | type   | possible_keys                                        | key         | key_len | ref                                   | rows | Extra |
+------+-------------+-------------+--------+------------------------------------------------------+-------------+---------+---------------------------------------+------+-------+
|    1 | SIMPLE      | claim_notes | ref    | claim_id,email_queue,user_id,claim_type_user_dtstamp | email_queue | 1       | const                                 |    3 |       |
|    1 | SIMPLE      | users       | eq_ref | PRIMARY                                              | PRIMARY     | 4       | guido_commercial.claim_notes.user_id  |    1 |       |
|    1 | SIMPLE      | claims      | eq_ref | PRIMARY                                              | PRIMARY     | 4       | guido_commercial.claim_notes.claim_id |    1 |       |
+------+-------------+-------------+--------+------------------------------------------------------+-------------+---------+---------------------------------------+------+-------+
3 rows in set (0.001 sec)

New setup:

+------+-------------+-------------+------+------------------------------------------------------+-------------+---------+-------+------+------------------------------------+
| id   | select_type | table       | type | possible_keys                                        | key         | key_len | ref   | rows | Extra                              |
+------+-------------+-------------+------+------------------------------------------------------+-------------+---------+-------+------+------------------------------------+
|    1 | SIMPLE      | claims      | ALL  | PRIMARY                                              | NULL        | NULL    | NULL  | 1    |                                    |
|    1 | SIMPLE      | users       | ALL  | PRIMARY                                              | NULL        | NULL    | NULL  | 1    | Using join buffer (flat, BNL join) |
|    1 | SIMPLE      | claim_notes | ref  | claim_id,email_queue,user_id,claim_type_user_dtstamp | email_queue | 1       | const | 5    | Using where                        |
+------+-------------+-------------+------+------------------------------------------------------+-------------+---------+-------+------+------------------------------------+
3 rows in set (0.000 sec)

All of the values being compared (like claim_notes.claim_id, claims.id, etc.) are all indexed (again, same schema across both). So the fact that EXPLAIN shows no key usage for 2 of the 3 steps is strange.

This seems to be key (pun intended). It sounds like a configuration issue... some change that occurred between 10.1 and 10.5 that changed the way indexes are utilized. But after hours of Googling and trial-and-error, I'm not able to find anything to resolve the issue (I did find this one (https://jira.mariadb.org/browse/MDEV-25707) that looked promising, but I tried setting eq_range_index_dive_limit to 0 and no change).

UPDATES:

  • Old Version: 10.3.27-MariaDB-0+deb10u1
  • New Version: 10.5.18-MariaDB-0+deb11u1-log

Ooops.. I originally posted the old as 10.1, so I fixed that.

CREATE TABLE `claims` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `updated` timestamp NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(),
  `created` timestamp NOT NULL DEFAULT current_timestamp(),
  `amount` decimal(10,2) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=173985 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

CREATE TABLE `claim_notes` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `claim_id` int(10) unsigned NOT NULL,
  `user_id` int(10) unsigned NOT NULL,
  `email_queue` tinyint(4) NOT NULL DEFAULT 0,
  PRIMARY KEY (`id`),
  KEY `claim_id` (`claim_id`),
  KEY `email_queue` (`email_queue`),
  KEY `user_id` (`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=26559123 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

CREATE TABLE `users` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `active` tinyint(4) DEFAULT NULL,
  `username` varchar(32) DEFAULT NULL,
  `password` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `active` (`active`)
) ENGINE=InnoDB AUTO_INCREMENT=576 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

These tables actually have quite a few more columns, but I removed them for brevity since they are completely unrelated to the query in question.

英文:

I have a MariaDB 10.3 Galera cluster with a DB that I'm migrating to newer hardware and upgrading to 10.5. When I mysqldump the database from the old server and import it into the new, any JOIN is god-awful slow. Here's an example query:

SELECT *
  FROM claim_notes
         INNER JOIN claims
                 ON claim_notes.claim_id = claims.id
         INNER JOIN users
                 ON claim_notes.user_id = users.id
 WHERE claim_notes.email_queue = 1

This runs in 0.7 seconds on the old cluster, and 3 minutes and 51 seconds on the new cluster. Yes, it's a stupid query, but it demonstrates the point.

Given that the database schema (including all indexes) along with the entire data set is identical across both setups, I can rule out poorly-optimized queries, or something like the optimizer treating them differently because of different data sets.

However, the optimizer does seem to treat them very differently, according to EXPLAIN. Old setup:

+------+-------------+-------------+--------+------------------------------------------------------+-------------+---------+---------------------------------------+------+-------+
| id   | select_type | table       | type   | possible_keys                                        | key         | key_len | ref                                   | rows | Extra |
+------+-------------+-------------+--------+------------------------------------------------------+-------------+---------+---------------------------------------+------+-------+
|    1 | SIMPLE      | claim_notes | ref    | claim_id,email_queue,user_id,claim_type_user_dtstamp | email_queue | 1       | const                                 |    3 |       |
|    1 | SIMPLE      | users       | eq_ref | PRIMARY                                              | PRIMARY     | 4       | guido_commercial.claim_notes.user_id  |    1 |       |
|    1 | SIMPLE      | claims      | eq_ref | PRIMARY                                              | PRIMARY     | 4       | guido_commercial.claim_notes.claim_id |    1 |       |
+------+-------------+-------------+--------+------------------------------------------------------+-------------+---------+---------------------------------------+------+-------+
3 rows in set (0.001 sec)

New setup:

+------+-------------+-------------+------+------------------------------------------------------+-------------+---------+-------+------+------------------------------------+
| id   | select_type | table       | type | possible_keys                                        | key         | key_len | ref   | rows | Extra                              |
+------+-------------+-------------+------+------------------------------------------------------+-------------+---------+-------+------+------------------------------------+
|    1 | SIMPLE      | claims      | ALL  | PRIMARY                                              | NULL        | NULL    | NULL  | 1    |                                    |
|    1 | SIMPLE      | users       | ALL  | PRIMARY                                              | NULL        | NULL    | NULL  | 1    | Using join buffer (flat, BNL join) |
|    1 | SIMPLE      | claim_notes | ref  | claim_id,email_queue,user_id,claim_type_user_dtstamp | email_queue | 1       | const | 5    | Using where                        |
+------+-------------+-------------+------+------------------------------------------------------+-------------+---------+-------+------+------------------------------------+
3 rows in set (0.000 sec)

All of the values being compared (like claim_notes.claim_id, claims.id, etc.) are all indexed (again, same schema across both). So the fact that EXPLAIN shows no key usage for 2 of the 3 steps is strange.

This seems to be key (pun intended). It sounds like a configuration issue.. some change that occurred between 10.1 and 10.5 that changed the way indexes are utilized. But after hours of Googling and trial-and-error, I'm not able to find anything to resolve the issue (I did find this one (https://jira.mariadb.org/browse/MDEV-25707) that looked promising, but I tried setting eq_range_index_dive_limit to 0 and no change).

UPDATES:

  • Old Version: 10.3.27-MariaDB-0+deb10u1
  • New Version: 10.5.18-MariaDB-0+deb11u1-log

Ooops.. I originally posted the old as 10.1, so I fixed that.

CREATE TABLE `claims` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `updated` timestamp NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(),
  `created` timestamp NOT NULL DEFAULT current_timestamp(),
  `amount` decimal(10,2) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=173985 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

CREATE TABLE `claim_notes` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `claim_id` int(10) unsigned NOT NULL,
  `user_id` int(10) unsigned NOT NULL,
  `email_queue` tinyint(4) NOT NULL DEFAULT 0,
  PRIMARY KEY (`id`),
  KEY `claim_id` (`claim_id`),
  KEY `email_queue` (`email_queue`),
  KEY `user_id` (`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=26559123 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

CREATE TABLE `users` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `active` tinyint(4) DEFAULT NULL,
  `username` varchar(32) DEFAULT NULL,
  `password` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `active` (`active`)
) ENGINE=InnoDB AUTO_INCREMENT=576 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

These tables actually have quite a few more columns, but I removed them for brevity since they are completely unrelated to the query in questions.

答案1

得分: 1

当你升级任何软件时,某些功能可能会有些不同。作为数据库的MariaDB添加了各种优化器类型。

最终,查询计划可能在早期版本中基于统计数据而幸运。当出现慢查询时,重新查看索引基础,确保MariaDB能够做出良好的决策。

由于严格的where条件位于email_queue上,并假设它高度选择性,将claim_iduser_id附加到索引中,这将为查询提供连接到其他claimuser表所需的信息。请记住查询在查询中每个表使用单个索引的原则。使用多列索引尽量帮助查询。

统计数据是关于表的摘要信息,可以从数据中获取。在查询的任何表上使用ANALYZE TABLE将有助于优化器准确表示所讨论的表。

在所提出的问题中,另一个奇怪之处在于claim_notes中的id是否是一个好的主键。如果claim_iduser_id的组合是唯一的,也许这是更好的主键。

在索引方面,请记住主键在所有辅助索引的末尾是隐式存在的。

英文:

As you upgrade any software, some things may work a little different. MariaDB as a database adds various optimizer types.

Ultimately the query plan may have been lucky in the earlier version based on statistics. When getting a slow query, relook at the indexing basics for making sure that its easy for MariaDB to make good decisions.

As a strict where condition is on the email_queue is there, and assuming its highly selective, appending to the index this would use by putting claim_id,user_id, gives the query the information it needs to join to the other claim and user tables quickly. Remember the principles that a query uses a single index per table in a query. Use multiple column indexes to help out queries as much as possible.

The statistics, a summary information about a table can get out of data. Using ANALYZE TABLE on any tables of the query will help the optimizer have a accurate representation of the table in question.

Another oddity in the presented question is in the claim_notes is id is a good primary key. If claim_id, user_id in combination is unique, maybe that is a better primary key.

With indexing remember that the primary key is invisibly there after the end of all the secondary indexes.

huangapple
  • 本文由 发表于 2023年3月23日 11:41:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/75819077.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定