英文:
PostgreSQL - checkpoint interval behaviour in different WAL levels
问题
以下是翻译好的内容:
Long story short:
我们需要对大约 4 亿行执行 UPDATE 命令。我知道可以将命令修改为批处理方式,但这是另一个话题。我们的问题是 WAL(Write-Ahead Logging)文件太大,导致磁盘空间不足。我想知道不同 WAL 级别下检查点间隔的工作原理。简而言之,文档中提到较长的检查点间隔会触发较少的完整页写入,从而导致较小的 WAL 文件。我找不到的是这种变化在不同的 wal_level 设置下会如何表现。
DB 版本: Postgres14.4
1. 在 minimal
wal_level 设置下是否相关?(考虑到它几乎删除了所有日志记录。)
2. 当 wal_level 设置为 replica
或更高时,是否会影响复制?(根据不同的文章和文档,这对我来说不太明显,但我认为复制应该不受影响,因为所有更改都已记录,尽管完整页/块写入较少,并且这可能有益,例如减小 WAL 文件的大小。)
我们处于可以进行完全备份和相关应用程序关闭的位置,因此 minimal
wal_level 设置可能有效,但我对其他解决方案也感兴趣,欢迎分享一些想法。
干杯!
英文:
I couldn't find a definite answer for my concerns, so I might as well ask it from you guys!
Long story short:
We need to perform an UPDATE command on roughly 400M rows. The command could be modified to work in batches I know, but that is a different topic. Our problem is that the WAL gets too big and we run out of disk space.
I'm wondering how the checkpoint intervals work with different WAL levels. To put it simply the documentation says that a longer checkpoint interval "triggers" less full page writes, which results in a smaller WAL. What I can't find is how this change behaves with different wal_level settings.
DB version: Postgres14.4
1. Does it have any relevance with a minimal
wal_level setting? (Considering it removes almost all logging.)
2. Does it break the replicas when the wal_level is set to replica
or higher? (It isn't obvious to me based on different articles and the documentation, but I assume the replicas should be fine since all the changes are logged despite of fewer full page/block writes, and it also can be benefitial i.e. decreased WAL size.)
We are in a position where a full backup and shutdown of related application is possible, so the minimal
wal_level setting could work, but I'm interested in different solutions as well, feel free to share some thoughts on it.
Cheers!
答案1
得分: 1
wal_level = minimal
不会有影响。只要不将其设置为 logical
,PostgreSQL 应该会产生大致相同数量的WAL。如果将 wal_level
设置为低于 replica
,它将破坏复制。
明显的解决方法是增加更多的磁盘空间。如果问题是WAL 归档,您可以禁用 archive_mode
。如果问题是检查点完成时间太长,您可以运行手动 CHECKPOINT
命令。
增加 max_wal_size
以减少写入的WAL量。是的,我知道这听起来很奇怪,但 max_wal_size
实际上并不控制 pg_wal
的大小,而是触发检查点(从而增加完整页图像的数量)。
英文:
wal_level = minimal
won't make a difference. As long as you don't set it to logical
, PostgreSQL should produce about the same amount of WAL. It will break replication if you set wal_level
to something lower than replica
.
The obvious solution is to add more disk space. If the problem is WAL archiving, you can disable archive_mode
. If the problem is that checkpoints take too long to complete, you could run a manual CHECKPOINT
command.
Increase max_wal_size
to reduce the amount of WAL written. Yes, I know that it sounds strange, but max_wal_size
does not govern the size of pg_wal
, but it triggers checkpoints (which increase the number of full page images written).
答案2
得分: 0
代码部分不翻译。以下是翻译好的内容:
Does it have any relevance with a minimal wal_level setting? (Considering it removes almost all logging.)
这与最小的wal_level设置有关吗?(考虑它几乎删除了所有日志记录。)
It doesn't. With minimal, you only skip WAL logging of a few things, like COPY into a table which was created or truncated in the same transaction, or the creation of indexes. Those special cases wouldn't apply to a bulk UPDATE.
不是的。使用最小设置,您只会跳过一些日志记录,例如将数据复制到在同一事务中创建或截断的表中,或索引的创建。这些特殊情况不适用于大批量的UPDATE。
To solve the problem, you first need to figure out what the root problem is. Are you so close to the out-of-space condition under normal conditions than any stress at all can push you over? Do you have replication slots, and the standbys can't keep up? Do you have an archive_command that can't keep up? Is your IO system so overwhelmed that the checkpoints can't finish in time despite trying as fast as they can? Is you max_wal_size writing checks your harddrive can't cash?
要解决问题,您首先需要弄清楚根本的问题是什么。在正常情况下,您是否离空间不足的状态如此之近,以至于任何一点压力都会将您推入这种情况?您是否有复制插槽,而备机无法跟上?您是否有一个archive_command无法跟上的情况?您的IO系统是否受到如此大的压力,以至于尽管尝试尽快完成检查点,但它们仍然无法按时完成?您的max_wal_size是否超过了硬盘的承受能力?
英文:
> Does it have any relevance with a minimal wal_level setting? (Considering it removes almost all logging.)
It doesn't. With minimal, you only skip WAL logging of a few things, like COPY into a table which was created or truncated in the same transaction, or the creation of indexes. Those special cases wouldn't apply to a bulk UPDATE.
To solve the problem, you first need to figure out what the root problem is. Are you so close to the out-of-space condition under normal conditions than any stress at all can push you over? Do you have replication slots, and the standbys can't keep up? Do you have an archive_command that can't keep up? Is your IO system so overwhelmed that the checkpoints can't finish in time despite trying as fast as they can? Is you max_wal_size writing checks your harddrive can't cash?
答案3
得分: 0
确保以下几点:
checkpoint_time
和max_wal_size
不要设置得太小。- 确保您的
archive_command
正常工作。
这些点对于避免对I/O系统造成过大负担很重要。
英文:
make sure about some points:
- that
checkpoint_time
andmax_wal_size
are not too small. - that you your
archive_command
is working.
These point are importants to avoid flooding the I/O system.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论