英文:
Unexpected Number of deadtuples after several updates
问题
我对PGSQL中更新场景后的dead tuples数量有疑问。
由于PGSQL中的MVCC协议,许多更新会导致表膨胀。我们可以通过跟踪表中dead tuples的数量来进行监控。在下面的情景中,我无法理解dead tuples的数量。
我正在使用以下测试用例更新"pgbench_accounts"表:
首先,我事先关闭了系统中的自动清理。
...=alter system set autovacuum to off;
...=# select pg_reload_conf();
然后我加载了pgbench表。
pgbench -i
-- 现在,pgbench_accounts有100,000个元组
然后我按以下方式随机生成的元组进行更新:
pgbench -t10000 -fupdate-only -n
update-only tx如下所示:
\set aid random(1, 100000 * :scale)
\set delta random(-5000, 5000)
BEGIN;
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
END;
正如你猜测的那样,我想用上面显示的一系列简单的更新tx更新10,000个元组。
进程结束后,我看到:
更新了9543个元组。这似乎没问题,因为有些元组更新了两次或更多。
问题是我只看到了3277个dead tuples。
尽管我关闭了自动清理,但一些dead tuples消失了。这是怎么发生的?我期望至少能看到相同数量的更新后的元组?
英文:
I have a question about the number of deadtuples after an update scenario in PGSQL.
Due to MVCC protocol in PGSQL, many updates cause the tables to bloat. We may track this with the number of dead tuples in the table. In the following scenario, I can't give any meaning the number of dead tuples in the following scenario.
I am updating "pgbench_accounts" table with the following testcase:
First, I turnedd off autovacuum in the system beforehand.
...=alter system set autovacuum to off;
...=# select pg_reload_conf();
Then I load the pgbench tables.
> pgbench -i
-- Now, pgbench_accounts has 100.000 tuples
Then I am updating randomly generated tuples as below:
>pgbench -t10000 -fupdate-only -n
update-only tx is as the following:
\set aid random(1, 100000 * :scale)
\set delta random(-5000, 5000)
BEGIN;
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
END;
As you guess, I want to update 10.000 tuples with a serial of simple update tx as shown above.
After the process end, I am seeing:
9543 updated tuples. That seems OK, because some tuples are updated twice or more.
The problem is I am seeing only 3277 dead tuples.
Although I turned off autovacuum, some dead tuples disappear. how does this happen? I am expecting to see, at least the same number of updated tuples?
答案1
得分: 1
可能并非所有更新都被计算在内(统计数据不一定100%准确),但更可能的是差异可以通过HOT更新来解释。
如果在块中仍有空间,并且不更新索引列,PostgreSQL可以使用不需要修改索引的HOT更新。HOT更新的死元组不需要通过VACUUM
来清理。任何SELECT
或其他语句(在您的情况下是UPDATE
)都可以锁定页面并重新组织它,以消除死元组。
您可以使用以下方式进行验证:
SELECT n_tup_upd, n_tup_hot_upd
FROM pg_stat_all_tables
WHERE relname = 'pgbench_accounts';
英文:
It could be that not all updates were counted (statistics are not necessarily 100% accurate), but more likely the difference can be explained with HOT updates.
If there is still room in the block, and you don't update an indexed column, PostgreSQL can use a HOT update that doesn't require modifying the index. Dead tuples from HOT updates don't need VACUUM
for cleanup. Any SELECT
or other statement (in your case: UPDATE
) can grab a brief lock on the page and reorganize it, getting rid of dead tuples.
You can verify that with
SELECT n_tup_upd, n_tup_hot_upd
FROM pg_stat_all_tables
WHERE relname = 'pgbench_accounts';
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论