2023年6月26日 06:15:48go评论90阅读模式

英文:

Why don't I see partition pruning for my query based on generate_series()?

问题

我正在使用 PostgreSQL 12 中的声明式分区：

create table foo (
    id  integer not null,
    date timestamp not null,
    count integer default 0,
    primary key (id, date)
)
    partition by RANGE (date);

使用这个简单的请求，Postgres 会搜索唯一的一个适当分区 - 这很好：

select sum(foo.count) as total,
       date_trunc('day', foo.date) as day_date
from foo
where foo.date between '2023-01-01' and '2023-01-02'
group by day_date

然而，当我使用 generate_series() 修改这个查询时，Postgres 开始搜索所有分区：

with times as (
    select generate_series('2023-01-01 12:00', '2023-01-02 16:00', '1 hour') as date1,
           generate_series('2023-01-01 16:00', '2023-01-02 20:00', '1 hour') as date2
)
select sum(foo.count) as total,
       times.date1
from times join foo on foo.date between times.date1 and times.date2
group by times.date1;

我假设查询规划器在执行查询之前会计划忽略一些分区，只是通过查看查询来确定。这意味着我不能使用动态生成的参数来进行筛选吗？是否有一些方法可以解决这个问题？

英文:

I'm using declarative partitioning in PostgreSQL 12:

create table foo (
    id  integer not null,
    date timestamp not null,
    count integer default 0,
    primary key (id, date)
)
    partition by RANGE (date);

With this simple request, Postgres searches through the only one proper partition - which is great:

select sum(foo.count) as total,
       date_trunc(&#39;day&#39;, foo.date) as day_date
from foo
where foo.date between &#39;2023-01-01&#39; and &#39;2023-01-02&#39;
group by day_date

However, when I modify this query using generate_series(), Postgres starts to search through all partitions:

with times as (
    select generate_series(&#39;2023-01-01 12:00&#39;, &#39;2023-01-02 16:00&#39;, &#39;1 hour&#39;) as date1,
           generate_series(&#39;2023-01-01 16:00&#39;, &#39;2023-01-02 20:00&#39;, &#39;1 hour&#39;) as date2
)
select sum(foo.count) as total,
       times.date1
from times join foo on foo.date between times.date1 and times.date2
group by times.date1;

I assume, the query planner plans to ignore some partitions before executing the query just by looking at it. Which means, I cannot use dynamically generated parameters for filtering?
Are there some approaches to solve this?

答案1

得分: 3

最有可能是你忽略了对于你的第二个查询，分区也被修剪了。

优先修正你的数据库设计和查询

不要为时间戳使用误导性的名称 "date"，改为使用 "ts"。
将你的主键更改为 PRIMARY KEY (ts, id)。功能上是一样的，但是在PK索引中，将 "ts" 作为主导列对你的查询有很大的影响。参见：
- 复合索引对于第一个字段的查询也有效吗？
不要对时间戳使用 BETWEEN，它包括上限，而分区边界排除上限（就像任何合理的时间戳范围操作一样）。手册：

> 在创建范围分区时，使用 FROM 指定的下限是一个包含的下限，而使用 TO 指定的上限是一个排除的上限。

否则，你会引发各种与边界不匹配的混淆。

可能已经（可能）工作的解释

我们需要打开 enable_partition_pruning - 你显然已经打开了，并且默认情况下也是开启的。这涵盖了两种不同的分区修剪方法：一种是在计划查询时，另一种是在执行期间。手册：

> 启用或禁用查询规划器消除分区表的分区出现在查询计划中的能力。这还控制规划器生成的查询计划是否允许查询执行器在查询执行期间删除（忽略）分区。默认值为 on。详细信息请参见第 5.11.4 节。

你的第一个简单查询具有常量过滤器，因此可以使用第一种方法，因为在规划时已知值。对于你的第二个查询情况不同：值稍后生成。因此只有第二种变体是可能的 - 它是在Postgres 11中添加的。发布说明：

> 允许在查询执行期间消除分区（David Rowley，Beena Emerson）

> 以前，分区消除只在计划时发生，这意味着许多连接和准备查询无法使用分区消除。

你声明了使用Postgres 12，所以应该可以工作。此外，升级到最新版本也不会有问题。Postgres 13 的发布说明：

> 允许在更多情况下修剪分区（Yuzuko Hosoya，Amit Langote，Álvaro Herrera）

更改在 pgsql-hackers 上的这个帖子中讨论。但是你的简单查询在Postgres 12中应该也可以获得分区修剪。

在 EXPLAIN 中，执行期间的分区修剪不会（也不能）显示在简单查询计划中。你必须使用 EXPLAIN (ANALYZE) 进行测试，然后你将看到对于始终被修剪的分区显示为 (never executed)。手册：

> [...] 在此阶段确定分区是否被修剪需要仔细检查 EXPLAIN ANALYZE 输出中的 loops 属性。与不同分区对应的子计划可能根据每个子计划在执行期间被修剪的次数而具有不同的值。如果它们每次都被修剪，一些可能显示为 (never executed)。

在这个 fiddle 中检查查询计划！

你的第一个（改进的！）查询的查询计划（即使来自纯粹的 EXECUTE）在规划时显示分区修剪：只涉及第一个分区。

你的第二个（改进的！）查询的查询计划（即使来自纯粹的 EXECUTE）对于分区 3 显示了 (never executed)，它一直被修剪，但对于分区 1 和 2，它们仅在某些迭代中被修剪。

Fix your DB design and query first

Don't use the misleading name "date" for a timestamp. Proceeding with "ts" instead.
Change your primary key to PRIMARY KEY (ts, id). Functionally the same, but ts as leading column in the PK index makes a big difference for your queries. See:
- Is a composite index also good for queries on the first field?
Don't use BETWEEN for timestamps, which includes the upper bound, while partition boundaries exclude the upper bound (like any sane range operation on timestamps). The manual:

> When creating a range partition, the lower bound specified with FROM
> is an inclusive bound, whereas the upper bound specified with TO is
> an exclusive bound.

Else, you invite all kinds of confusion with bounds that don't match.

Explanation how it (probably) works already

We need enable_partition_pruning turned on - which you obviously have, and is default anyway. This covers two distinct methods of partition pruning: one while planning the query, another one during execution. The manual:

> Enables or disables the query planner's ability to eliminate a
> partitioned table's partitions from query plans. This also controls
> the planner's ability to generate query plans which allow the query
> executor to remove (ignore) partitions during query execution. The
> default is on. See Section 5.11.4 for details.

Your first, simple query with a constant filter can use the first method because values are known at planning time. That's not the case for your second query: values are generated later. So only the second variant is possible - which was added with Postgres 11. The release notes:

> Allow partition elimination during query execution (David Rowley, Beena Emerson)

> Previously, partition elimination only happened at planning time, meaning many joins and prepared queries could not use partition elimination.

You declared Postgres 12, so it should work.
Aside, it won't hurt to upgrade to the latest version. The release notes Postgres 13:

> Allow pruning of partitions to happen in more cases (Yuzuko Hosoya, Amit Langote, Álvaro Herrera)

Changes were discussed in this thread on pgsql-hackers.
But your simple query should get partition pruning in Postgres 12, too.

The second method of partition pruning during execution does not (cannot) show in the simple query plan from EXPLAIN. You have to test with EXPLAIN (ANALYZE), then you'll see (never executed) for partitions that are always pruned. The manual:

> [...] Determining if partitions were pruned during this phase requires
> careful inspection of the loops property in the EXPLAIN ANALYZE
> output. Subplans corresponding to different partitions may have
> different values for it depending on how many times each of them was
> pruned during execution. Some may be shown as (never executed) if
> they were pruned every time.

Inspect the query plans in this fiddle!

The query plan (even from plain EXECUTE) for your first (improved!) query shows partition pruning during planning: only the first partition is involved to begin with.

The query plan (even from plain EXECUTE) for your second (improved!) query shows (never executed) for partition 3, which is pruned all the time, but not for partition 1 & 2, which are only pruned for some of the iterations.

How to utilize partition pruning with subqueries or joins?

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

为什么我在基于generate_series()的查询中看不到分区修剪？

问题

答案1

优先修正你的数据库设计和查询

可能已经（可能）工作的解释

Fix your DB design and query first

Explanation how it (probably) works already

Connection made to the postgres server,but datas refuse to be fetched from table "students"of database "student"

为什么在PostgreSQL的表空间中需要TABLESPACE_VERSION_DIRECTORY？

PostgreSQL插入具有单列主键的自增表

关于在RDS/Aurora PostgreSQL上创建数据库的警报

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。