意外的死元组数量在多次更新后

huangapple go评论60阅读模式
英文:

Unexpected Number of deadtuples after several updates

问题

我对PGSQL中更新场景后的dead tuples数量有疑问。

由于PGSQL中的MVCC协议,许多更新会导致表膨胀。我们可以通过跟踪表中dead tuples的数量来进行监控。在下面的情景中,我无法理解dead tuples的数量。

我正在使用以下测试用例更新"pgbench_accounts"表:

首先,我事先关闭了系统中的自动清理。
...=alter system set autovacuum to off;
...=# select pg_reload_conf();

然后我加载了pgbench表。

pgbench -i
-- 现在,pgbench_accounts有100,000个元组

然后我按以下方式随机生成的元组进行更新:

pgbench -t10000 -fupdate-only -n

update-only tx如下所示:
\set aid random(1, 100000 * :scale)
\set delta random(-5000, 5000)
BEGIN;
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
END;

正如你猜测的那样,我想用上面显示的一系列简单的更新tx更新10,000个元组。

进程结束后,我看到:
更新了9543个元组。这似乎没问题,因为有些元组更新了两次或更多。
问题是我只看到了3277个dead tuples。

尽管我关闭了自动清理,但一些dead tuples消失了。这是怎么发生的?我期望至少能看到相同数量的更新后的元组?

英文:

I have a question about the number of deadtuples after an update scenario in PGSQL.

Due to MVCC protocol in PGSQL, many updates cause the tables to bloat. We may track this with the number of dead tuples in the table. In the following scenario, I can't give any meaning the number of dead tuples in the following scenario.

I am updating "pgbench_accounts" table with the following testcase:

First, I turnedd off autovacuum in the system beforehand.
...=alter system set autovacuum to off;
...=# select pg_reload_conf();

Then I load the pgbench tables.
> pgbench -i
-- Now, pgbench_accounts has 100.000 tuples

Then I am updating randomly generated tuples as below:
>pgbench -t10000 -fupdate-only -n

   update-only tx is as the following:
   \set aid random(1, 100000 * :scale)
   \set delta random(-5000, 5000)
   BEGIN;
   UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
   END;

As you guess, I want to update 10.000 tuples with a serial of simple update tx as shown above.

After the process end, I am seeing:
9543 updated tuples. That seems OK, because some tuples are updated twice or more.
The problem is I am seeing only 3277 dead tuples.

Although I turned off autovacuum, some dead tuples disappear. how does this happen? I am expecting to see, at least the same number of updated tuples?

答案1

得分: 1

可能并非所有更新都被计算在内(统计数据不一定100%准确),但更可能的是差异可以通过HOT更新来解释。

如果在块中仍有空间,并且不更新索引列,PostgreSQL可以使用不需要修改索引的HOT更新。HOT更新的死元组不需要通过VACUUM来清理。任何SELECT或其他语句(在您的情况下是UPDATE)都可以锁定页面并重新组织它,以消除死元组。

您可以使用以下方式进行验证:

SELECT n_tup_upd, n_tup_hot_upd
FROM pg_stat_all_tables
WHERE relname = 'pgbench_accounts';
英文:

It could be that not all updates were counted (statistics are not necessarily 100% accurate), but more likely the difference can be explained with HOT updates.

If there is still room in the block, and you don't update an indexed column, PostgreSQL can use a HOT update that doesn't require modifying the index. Dead tuples from HOT updates don't need VACUUM for cleanup. Any SELECT or other statement (in your case: UPDATE) can grab a brief lock on the page and reorganize it, getting rid of dead tuples.

You can verify that with

SELECT n_tup_upd, n_tup_hot_upd
FROM pg_stat_all_tables
WHERE relname = 'pgbench_accounts';

huangapple
  • 本文由 发表于 2023年6月19日 16:42:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/76504997.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定