MySQL通过连接进行更新 – 为什么速度慢?

huangapple go评论70阅读模式
英文:

MySQL update via join - why is it slow?

问题

以下是您要翻译的代码和文本部分:

I have the following query:

    UPDATE
        `temp_table`,
        `sms`
    SET
        `sms`.`to` = `temp_table`.`new_value`
    WHERE
        `sms`.`to` = `temp_table`.`old_value`;

`temp_table` has ~200,000 rows.

`sms` has ~2,000,000 rows.

`sms`.`to`, `temp_table`.`new_value`, and `temp_table`.`old_value` are VARCHAR(255) with indexes.

Values are UK phone numbers.

The update query is slow it never completes. Does anyone know why?

**Explain:**

[![Result of EXPLAIN][1]][1]

[1]: https://i.stack.imgur.com/swKuO.png

Text Results for Creating Tables:

From phpMyAdmin

CREATE TABLE `temp_table` (
    `old_value` varchar(255) DEFAULT NULL,
    `new_value` varchar(255) DEFAULT NULL
) ENGINE = InnoDB DEFAULT CHARSET = latin1 COLLATE = latin1_swedish_ci;

ALTER TABLE `temp_table`
    ADD KEY `old_value` (`old_value`),
    ADD KEY `new_value` (`new_value`);

CREATE TABLE `sms` (
    `id` int(11) NOT NULL,
    `branch_id` int(11) DEFAULT NULL,
    `customer_id` int(11) DEFAULT NULL,
    `message_id` int(11) DEFAULT NULL,
    `message` text DEFAULT NULL,
    `from` varchar(255) DEFAULT NULL,
    `to` varchar(255) DEFAULT NULL,
    `status` varchar(255) DEFAULT NULL,
    `created_at` int(11) DEFAULT NULL,
    `updated_at` int(11) DEFAULT NULL
) ENGINE = InnoDB DEFAULT CHARSET = utf8 COLLATE = utf8_general_ci;

ALTER TABLE `sms`
    ADD PRIMARY KEY (`id`),
    ADD KEY `idx-sms-branch_id` (`branch_id`),
    ADD KEY `idx-sms-customer_id` (`customer_id`),
    ADD KEY `idx-sms-message_id` (`message_id`),
    ADD KEY `idx-sms-to` (`to`),
    ADD KEY `idx-sms-created_at` (`created_at`),
    ADD KEY `idx-sms-updated_at` (`updated_at`);

ALTER TABLE `sms`
    MODIFY `id` int(11) NOT NULL AUTO_INCREMENT;

希望这有所帮助。如果您需要更多信息,请告诉我。

英文:

I have the following query:

UPDATE
`temp_table`,
`sms`
SET
`sms`.`to` = `temp_table`.`new_value`
WHERE
`sms`.`to` = `temp_table`.`old_value`;

temp_table has ~200,000 rows.

sms has ~2,000,000 rows.

sms.to, temp_table.new_value, and temp_table.old_value are VARCHAR(255) with indexes.

Values are UK phone numbers.

The update query is slow it never completes. Does anyone know why?

Explain:

MySQL通过连接进行更新 – 为什么速度慢?

Text Results for Creating Tables:

From phpMyAdmin

CREATE TABLE `temp_table` (
`old_value` varchar(255) DEFAULT NULL,
`new_value` varchar(255) DEFAULT NULL
) ENGINE = InnoDB DEFAULT CHARSET = latin1 COLLATE = latin1_swedish_ci;
ALTER TABLE `temp_table`
ADD KEY `old_value` (`old_value`),
ADD KEY `new_value` (`new_value`);
CREATE TABLE `sms` (
`id` int(11) NOT NULL,
`branch_id` int(11) DEFAULT NULL,
`customer_id` int(11) DEFAULT NULL,
`message_id` int(11) DEFAULT NULL,
`message` text DEFAULT NULL,
`from` varchar(255) DEFAULT NULL,
`to` varchar(255) DEFAULT NULL,
`status` varchar(255) DEFAULT NULL,
`created_at` int(11) DEFAULT NULL,
`updated_at` int(11) DEFAULT NULL
) ENGINE = InnoDB DEFAULT CHARSET = utf8 COLLATE = utf8_general_ci;
ALTER TABLE `sms`
ADD PRIMARY KEY (`id`),
ADD KEY `idx-sms-branch_id` (`branch_id`),
ADD KEY `idx-sms-customer_id` (`customer_id`),
ADD KEY `idx-sms-message_id` (`message_id`),
ADD KEY `idx-sms-to` (`to`),
ADD KEY `idx-sms-created_at` (`created_at`),
ADD KEY `idx-sms-updated_at` (`updated_at`);
ALTER TABLE `sms`
MODIFY `id` int(11) NOT NULL AUTO_INCREMENT;

答案1

得分: 2

You can use MySQL Update with Join in your query.

UPDATE sms
JOIN temp_table ON sms.to = temp_table.old_value
SET sms.to = temp_table.new_value;

Also, it would be better if you can index the search column "temp_table.old_value."

ALTER TABLE temp_table ADD INDEX idx_old_value (old_value);

So, search results will be faster.

英文:

You can use MySQL Update with Join in your query.

UPDATE `sms`
JOIN `temp_table` ON `sms`.`to` = `temp_table`.`old_value`
SET `sms`.`to` = `temp_table`.`new_value`;

Also, it would be better if you can indexes the search column "temp_table.old_value".

ALTER TABLE `temp_table` ADD INDEX `idx_old_value` (`old_value`);

So, search result be faster.

答案2

得分: 0

  • 如果优化器选择通过 sms(然后与其他表JOIN)走,那么对于temp_table来说,拥有 INDEX(old_value, new_value)INDEX(old_value) 更好。

  • 为什么这个表中同时包含message_id和消息文本?也许将该配对放入另一个表中会使此查询速度更快。然后,只在该表中保留message_id。如果text通常很大,这一点尤为重要。

  • 一次更新 200K 行需要很长时间和内存。一个计划是每次更新 1K 行。请参考chunking

我无法预测这三个建议中哪一个会帮助最多。如果可行的话,我建议都尝试。

英文:
  • If the Optimizer chooses to walk through sms (and then JOIN to the other table), then it would be better for temp_table to have INDEX(old_value, new_value) _instead of INDEX(old_value).

  • Why are both message_id and the message text in this table? Possibly having that pair in another table would make this query much faster. (Then have only the message_id in this table. This is especially important if the text is often quite large.

  • UPDATEing 200K rows all at once takes much time and RAM. One plan is to do the updates 1K at a time. See chunking.

I can't predict which of these three suggestions will help the most. I recommend doing all of the, if practical.

答案3

得分: 0

你没有ON子句,但你在使用一个WHERE子句:

INNER JOIN和,(逗号)在没有连接条件的情况下在语义上是等效的,它们都在指定的表之间产生笛卡尔积(也就是第一个表中的每一行都与第二个表中的每一行都连接)。

...

与ON一起使用的search_condition是可以在WHERE子句中使用的条件表达式的形式。通常,ON子句用于指定连接表的条件,而WHERE子句限制了要包含在结果集中的行。

在你的情况下,最终结果是相同的(但在其他情况下不是)。从概念上讲,在表已经以笛卡尔积的方式连接之后,"where"筛选发生,查询优化器肯定可以处理这个并"下推"where谓词,但我建议首先使用专用语法指示MySQL以构建一个明智的执行计划。

如果你注意到关于更新的文档中说:

table_references子句列出了参与连接的表。其语法在第13.2.13.2节“JOIN子句”中描述。下面是一个示例:

UPDATE items,month SET items.price=month.price WHERE
items.id=month.id;

上面的示例显示了一个使用逗号运算符的内连接,但多表UPDATE语句可以使用SELECT语句中允许的任何类型的连接,如LEFT JOIN。

因此,MySQL文档中使用的示例使用了其中一种可能的连接,那就是笛卡尔积。

英文:

You don't have an ON clause, but you are using a WHERE clause:

> INNER JOIN and , (comma) are semantically equivalent in the absence of a join condition: both produce a Cartesian product between the specified tables (that is, each and every row in the first table is joined to each and every row in the second table).

...
>
> The search_condition used with ON is any conditional expression of the form that can be used in a WHERE clause. Generally, the ON clause serves for conditions that specify how to join tables, and the WHERE clause restricts which rows to include in the result set.

In your case the end result is the same (but in other cases is not). Conceptually the "where" filtering happens after the tables have been joined in a Cartesian product, the query optimizer for sure can do something about it and "push down" the where predicate but I would suggest for starters to instruct MySQL with the dedicated syntax to allow it to build an informed execution plan.

If you notice in the documentation about update it says:

> The table_references clause lists the tables involved in the join. Its syntax is described in Section 13.2.13.2, “JOIN Clause”. Here is an example:
>

> UPDATE items,month SET items.price=month.price WHERE
> items.id=month.id;

> The preceding example shows an inner join that uses the comma
> operator, but multiple-table UPDATE statements can use any type of
> join permitted in SELECT statements, such as LEFT JOIN.

So the example used in MySQL docs uses one of the possible joins available and that one is the cartesian product

答案4

得分: 0

The Explain of the Update statement shows the usage of 'old_value' (possible_keys) for the temp_table, and the select_type is 'UPDATE' for the sms table, both are different from the OP output. (see further below for my output).

Besides that there is also an inconsistency in the COLLATE and CHARSET definitions for the two tables, which could also have some side effects in performance.

Table temp_table:

Field Type Null Key Default Extra
old_value varchar(255) YES MUL NULL
new_value varchar(255) YES MUL NULL

MySQL version: 8.0.32

Good Luck

英文:

The Explain of the Update statement shows the usage of 'old_value' (possible_keys) for the temp_table, and the select_type is 'UPDATE' for the sms table, both are different from the OP output. (see further below for my output).

Besides that there is also an inconsistency in the COLLATE and CHARSET definitions for the two tables, which could also have some sideeffects in performance.

describe UPDATE     `temp_table`,     `sms` SET     `sms`.`to` = `temp_table`.`new_value` WHERE     `sms`.`to` = `temp_table`.`old_value`;
+----+-------------+------------+------------+------+---------------+------------+---------+--------------------------+-------+----------+-------------+
| id | select_type | table      | partitions | type | possible_keys | key        | key_len | ref                      | rows  | filtered | Extra       |
+----+-------------+------------+------------+------+---------------+------------+---------+--------------------------+-------+----------+-------------+
|  1 | SIMPLE      | temp_table | NULL       | ALL  | old_value     | NULL       | NULL    | NULL                     | 21702 |   100.00 | Using where |
|  1 | UPDATE      | sms        | NULL       | ref  | idx-sms-to    | idx-sms-to | 768     | sms.temp_table.old_value |     1 |   100.00 | Using where |
+----+-------------+------------+------------+------+---------------+------------+---------+--------------------------+-------+----------+-------------+
2 rows in set, 2 warnings (0.00 sec)

Testing with about 20k sms with 20k temp_table, my humble machine does it in 0.63seconds, where all records are updated. Note: only the relevant fields for the update query has been with unique values.

The Update:

UPDATE     `temp_table`,     `sms` SET     `sms`.`to` = `temp_table`.`new_value` WHERE     `sms`.`to` = `temp_table`.`old_value`;
Query OK, 20000 rows affected (0.68 sec)
Rows matched: 20000  Changed: 20000  Warnings: 0

Table temp_table:

describe temp_table;
+-----------+--------------+------+-----+---------+-------+
| Field     | Type         | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+-------+
| old_value | varchar(255) | YES  | MUL | NULL    |       |
| new_value | varchar(255) | YES  | MUL | NULL    |       |
+-----------+--------------+------+-----+---------+-------+
2 rows in set (0.01 sec)
mysql  Ver 8.0.32

Good Luck

huangapple
  • 本文由 发表于 2023年3月31日 23:55:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/75900516.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定