2023年6月13日 02:18:20go评论117阅读模式

英文:

Why is SELECT...FOR UPDATE not working if executed inside a CTE that is not used later in the transaction?

问题

我有一个简单的表格如下所示：

这个表格当前有两行：

id | name
----+---------
0 | Matthew
1 | Thomas
(2 行)


现在我有两个同时运行的事务：

事务 1

BEGIN;

WITH locked_rows AS(
SELECT *
FROM TEST
FOR UPDATE
)
INSERT INTO TEST
VALUES
(1, 'Chris'),
(0, 'John')
ON CONFLICT(id) DO
UPDATE
SET name = excluded.name
WHERE test.name <> excluded.name

COMMIT;

事务 2

BEGIN;

WITH locked_rows AS(
SELECT *
FROM TEST
FOR UPDATE
)
INSERT INTO TEST
VALUES
(0, 'Bryan'),
(1, 'Steven')
ON CONFLICT(id) DO
UPDATE
SET name = excluded.name
WHERE test.name <> excluded.name

COMMIT;


故意将要插入的值在两个事务之间以相反的顺序插入，因为我想触发一个死锁并看看是否可以通过使用 `SELECT...FOR UPDATE;` 来避免。现在我明白改变顺序将会防止死锁，但我仍然想了解这里发生了什么。

当这两个事务同时执行时，它们总是会发生死锁，并显示错误：

ERROR: 死锁检测到
DETAIL: 进程 29588 等待事务 255002 上的 ShareLock; 被进程 29010 阻塞。
进程 29010 等待事务 255003 上的 ShareLock; 被进程 29588 阻塞。
HINT: 请参阅服务器日志以获取查询详情。
CONTEXT: 在关系 "test" 中插入索引元组 (0,27)
SQL 状态: 40P01


但是如果我移除 CTE 并简单地将每个事务拆分为两个单独的命令，例如：

BEGIN;

SELECT * FROM TEST FOR UPDATE;

INSERT INTO TEST
VALUES
(1, 'Chris'),
(0, 'John')
ON CONFLICT(id) DO
UPDATE
SET name = excluded.name
WHERE test.name <> excluded.name

COMMIT;


对于另一个事务也是一样，那么死锁就会消失，第二个事务只会等待第一个事务释放对行的锁，这是我所期望的。

我希望在使用 CTE 时也会发生同样的情况，因为 `SELECT...FOR UPDATE` 仍然在事务中执行，但事实并非如此。我还尝试过声明 CTE 为 `AS MATERIALIZED`，看看是否会有所不同，但没有任何变化。
根据 [Postgres 文档](https://www.postgresql.org/docs/current/sql-select.html#SQL-FOR-UPDATE-SHARE) 中的描述：

&gt; 如果在锁定子句中指定了特定的表格，则只锁定来自这些表格的行；在 SELECT 中使用的任何其他表格都会像平常一样只读取。没有表格列表的锁定子句会影响语句中使用的所有表格。如果将锁定子句应用于视图或子查询，则会影响视图或子查询中使用的所有表格。然而，这些子句不适用于主查询引用的 WITH 查询。如果要在 WITH 查询内发生行锁定，请在 WITH 查询中指定锁定子句。

但我真的不理解它的含义，我认为这是在提及 `FOR UPDATE` 中的可选参数 `[ OF table_name [, ...] ]`。

Postgres 是完全忽略 CTE 吗，因为它在后续查询中没有被使用吗？如果是这样的话，那么为什么其他命令如 `DELETE` 或 `UPDATE` 在未使用的 CTE 中能正常工作呢？或者是在幕后发生了其他事情？
注意：我使用的是 PostgreSQL 15.2

英文:

I have a simple table that is as seen below:

Table &quot;public.test&quot;
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 id     | integer |           | not null | 
 name   | text    |           |          | 
Indexes:
    &quot;ss_pkey&quot; PRIMARY KEY, btree (id)

The table currently has two rows:

id |  name   
----+---------
  0 | Matthew
  1 | Thomas
(2 rows)

Now I have two transactions running concurrently:

Transaction 1

BEGIN;

WITH locked_rows AS(
	SELECT *
	FROM TEST
	FOR UPDATE
)
INSERT INTO TEST
VALUES 
(1, &#39;Chris&#39;),
(0, &#39;John&#39;) 
ON CONFLICT(id) DO
UPDATE
SET name = excluded.name
WHERE test.name &lt;&gt; excluded.name

COMMIT;

Transaction 2

BEGIN;

WITH locked_rows AS(
	SELECT *
	FROM TEST
	FOR UPDATE
)
INSERT INTO TEST
VALUES 
(0, &#39;Bryan&#39;),
(1, &#39;Steven&#39;) 
ON CONFLICT(id) DO
UPDATE
SET name = excluded.name
WHERE test.name &lt;&gt; excluded.name

COMMIT;

The values to be inserted are purposefully inserted in reverse order between the transactions because I want to trigger a deadlock and see if it can be avoided by using SELECT...FOR UPDATE;. Now I understand that changing the order will prevent the deadlock but I still want to understand what's happening here.

When these two transactions are executed concurrently, they always deadlock with the error:

ERROR:  deadlock detected
DETAIL:  Process 29588 waits for ShareLock on transaction 255002; blocked by process 29010.
Process 29010 waits for ShareLock on transaction 255003; blocked by process 29588.
HINT:  See server log for query details.
CONTEXT:  while inserting index tuple (0,27) in relation &quot;test&quot;
SQL state: 40P01

but if I remove the CTE and simply break each transaction into two separate commands, e.g:

BEGIN;

SELECT * FROM TEST FOR UPDATE;

INSERT INTO TEST
VALUES 
(1, &#39;Chris&#39;),
(0, &#39;John&#39;) 
ON CONFLICT(id) DO
UPDATE
SET name = excluded.name
WHERE test.name &lt;&gt; excluded.name

COMMIT;

and the same for the other transaction, then the deadlocks disappear and the second transaction simply waits for the first one to release the lock on the rows, which is what I expect to happen.

I would expect the same to happen when I use a CTE since the SELECT...FOR UPDATE is still executed in the transaction, but that is not the case. I have also tried declaring the CTE AS MATERIALIZED to see if that will make a difference but nothing changed.
From the Postgres docs, I see this:
> If specific tables are named in a locking clause, then only rows coming from those tables are locked; any other tables used in the SELECT are simply read as usual. A locking clause without a table list affects all tables used in the statement. If a locking clause is applied to a view or sub-query, it affects all tables used in the view or sub-query. However, these clauses do not apply to WITH queries referenced by the primary query. If you want row locking to occur within a WITH query, specify a locking clause within the WITH query.

but I don't really understand what it means and I think this is referring to the [ OF table_name [, ...] ] optional argument in the FOR UPDATE.

Is postgres completely ignoring the CTE because it is not used in subsequent queries? If that's the case, then why do other commands such as DELETE or UPDATE work properly inside an unused CTE? Or is something else happening behind the scenes?

Note: I'm using PostgreSQL 15.2

答案1

得分: 2

你出现死锁是因为通用表达式（CTE）从未被执行。

只有包含数据操纵语句（DML语句）或在主语句中使用的CTE才会被执行。请参阅文档：

在WITH中执行的数据修改语句仅执行一次，并始终完成，无论主查询是否读取它们的输出（甚至是否读取）。请注意，这与WITH中的SELECT的规则不同：如前一节所述，SELECT的执行仅限于主查询需要其输出的程度。

关于使用SELECT ... FOR UPDATE来避免死锁，有两个注意事项：

除非您计划删除行或修改关键字，否则应该使用FOR NO KEY UPDATE以避免过度锁定。
两个会话运行SELECT ... FROM tab FOR NO KEY UPDATE仍然可能相互死锁。
- 因为同步的顺序扫描（请参阅synchronize_seqscans参数）
- 如果表上存在并发更新，并且两个SELECT的快照以不同的顺序查看行

英文:

You are getting a deadlock because the CTE is never executed.

A CTE is only executed if it contains a DML statement or if it is used in the main statement. See the documentation:

> Data-modifying statements in WITH are executed exactly once, and always to completion, independently of whether the primary query reads all (or indeed any) of their output. Notice that this is different from the rule for SELECT in WITH: as stated in the previous section, execution of a SELECT is carried only as far as the primary query demands its output.

Two notes about avoiding deadlocks with SELECT ... FOR UPDATE:

unless you plan to delete the row or modify a key, you should use FOR NO KEY UPDATE to avoid excessive locking
two sessions running SELECT ... FROM tab FOR NO KEY UPDATE could still deadlock with each other
- because of synchronized sequential scans (see the synchronize_seqscans parameter)
- if there are concurrent updates on the table, and the snapshots of both SELECTs see the rows in different orders

答案2

得分: 0

两个事务都在尝试在同一行上同时获取“独占”锁定。

在第一个事务中，您使用了一个公共表达式（CTE）来选择测试表中的所有行，并使用FOR UPDATE来锁定它们。这意味着第一个事务将在测试表中的所有行上获取“独占”锁定。

在第二个事务中，您也使用了一个CTE来选择测试表中的所有行。然而，这个CTE没有使用FOR UPDATE子句。这意味着第二个事务只会在测试表中的行上获取“共享锁”。

当第二个事务尝试更新已被第一个事务锁定的行时，将会发生“死锁”。

为了避免这种死锁，您可以：

在两个CTE中都使用FOR UPDATE子句
使用一个单独的事务来执行选择和更新

在两个CTE中都使用FOR UPDATE子句

BEGIN;

WITH locked_rows AS(
    SELECT *
    FROM TEST
    FOR UPDATE
),
updated_rows AS(
    INSERT INTO TEST
    VALUES 
    (1, 'Chris'),
    (0, 'John') 
    ON CONFLICT(id) DO
    UPDATE
    SET name = excluded.name
    WHERE test.name <> excluded.name
)
SELECT *
FROM locked_rows
JOIN updated_rows USING (id);

COMMIT;

使用单个事务执行选择和更新：

BEGIN;

SELECT *
FROM TEST
FOR UPDATE;

INSERT INTO TEST
VALUES 
(1, 'Chris'),
(0, 'John') 
ON CONFLICT(id) DO
UPDATE
SET name = excluded.name
WHERE test.name <> excluded.name;

COMMIT;

第二个CTE也使用了FOR UPDATE子句。然而，仍然会发生死锁，因为第二个事务试图在第一个事务已锁定的行上获取“独占”锁。

第二个事务能够在测试表中的行上获取“共享锁”是因为第一个事务已经在这些行上获取了“独占锁”。这是因为FOR UPDATE子句是一个“阻塞锁”。这意味着第二个事务将等待第一个事务释放对这些行的“独占”锁，然后才能在这些行上获取“共享”锁。

然而，当第二个事务尝试更新已被第一个事务锁定的行时，将会发生死锁，因为第二个事务现在试图在该行上获取“独占”锁。这将导致第一个事务被阻塞，第二个事务也被阻塞。这将创建一个无法解决的“死锁”情况。为了避免这种死锁，您可以在两个CTE中都使用FOR UPDATE子句，或者您可以使用一个单独的事务来执行选择和更新。

英文:

both transactions are trying to acquire an exclusive lock on the same row at the same time.

In the first transaction, you are using a CTE to select all rows from the test table and lock them with FOR UPDATE. This means that the first transaction will acquire an exclusive lock on all rows in the test table.

In the second transaction, you are also using a CTE to select all rows from the test table. However, this CTE does not use the FOR UPDATE clause. This means that the second transaction will only acquire a shared lock on the rows in the test table.

When the second transaction tries to update a row that is already locked by the first transaction, a deadlock will occur.

To avoid this deadlock, you can either:

Use the FOR UPDATE clause in both CTEs
Use a single transaction to perform both the select and the update

Using FOR UPDATE clause in both CTEs

BEGIN;

WITH locked_rows AS(
    SELECT *
    FROM TEST
    FOR UPDATE
),
updated_rows AS(
    INSERT INTO TEST
    VALUES 
    (1, &#39;Chris&#39;),
    (0, &#39;John&#39;) 
    ON CONFLICT(id) DO
    UPDATE
    SET name = excluded.name
    WHERE test.name &lt;&gt; excluded.name
)
SELECT *
FROM locked_rows
JOIN updated_rows USING (id);

COMMIT;

single transaction to perform both the select and the update:

BEGIN;

SELECT *
FROM TEST
FOR UPDATE;

INSERT INTO TEST
VALUES 
(1, &#39;Chris&#39;),
(0, &#39;John&#39;) 
ON CONFLICT(id) DO
UPDATE
SET name = excluded.name
WHERE test.name &lt;&gt; excluded.name;

COMMIT;

the second CTE is also using the FOR UPDATE clause. However, the deadlock still occurs because the second transaction is trying to acquire an exclusive lock on a row that is already locked by the first transaction.

The reason why the second transaction is able to acquire a shared lock on the rows in the test table is because the first transaction has already acquired an exclusive lock on those rows. This is because the FOR UPDATE clause is a blocking lock. This means that the second transaction will wait until the first transaction releases the exclusive lock on the rows before it can acquire a shared lock on those rows.

However, when the second transaction tries to update a row that is already locked by the first transaction, a deadlock will occur because the second transaction is now trying to acquire an exclusive lock on the row. This will cause the first transaction to block, and the second transaction to block. This will create a deadlock situation that cannot be resolved.

To avoid this deadlock, you can either use the FOR UPDATE clause in both CTEs, or you can use a single transaction to perform both the select and the update.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

SELECT…FOR UPDATE如果在事务中的CTE中执行但后续未使用时为什么不起作用？

问题

答案1

答案2

在SQL数据库中存储约会，如Postgres，以供java.time框架使用。

SQL表格合并：数组列

使用SQL获取JSON元素而不指定记录索引？

How is a many to many self-relationship in data modeling represented in database tables and how is the query SQL instruction?

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论