问题

我创建了以下示例，但不明白为什么计划程序不使用索引 i2 来执行查询。如pg_stats中所示，它知道列 uniqueIds 包含唯一值。它还知道列 fourOtherIds 仅包含4个不同的值。那么使用索引 i2 搜寻不是最快的方式吗？只在 fourOtherIds 的四个不同索引叶子中查找 uniqueIds？我对索引工作原理的理解有什么问题？为什么它认为在这里使用 i1 更有意义，尽管需要过滤掉333,333行？在我的理解中，应该使用 i2 首先找到具有 uniqueIds 4000的一行（或几行，因为没有唯一约束），然后将 fourIds = 1 作为过滤条件应用。

创建表t（fourIds int，uniqueIds int，fourOtherIds int）;
插入到t（选择1，*，5 from generate_series（1，1000000））;
插入到t（选择2，*，6 from generate_series（1000001，2000000））;
插入到t（选择3，*，7 from generate_series（2000001，3000000））;
插入到t（选择4，*，8 from generate_series（3000001，4000000））;
创建索引i1在t（fourIds）;
创建索引i2在t（fourOtherIds，uniqueIds）;
分析t;
选择n_distinct，attname from pg_stats where tablename = 't';
/* 
n_distinct|attname     |
----------+------------+
       4.0|fourids     |
      -1.0|uniqueids   |
       4.0|fourotherids|
*/
解释分析选择* from t where fourIds = 1 and uniqueIds = 4000;
/*
QUERY PLAN                                                                                                                |
--------------------------------------------------------------------------------------------------------------------------+
Gather  (cost=1000.43..22599.09 rows=1 width=12) (actual time=0.667..46.818 rows=1 loops=1)                               |
  Workers Planned: 2                                                                                                      |
  Workers Launched: 2                                                                                                     |
  ->  Parallel Index Scan using i1 on t  (cost=0.43..21598.99 rows=1 width=12) (actual time=25.227..39.852 rows=0 loops=3)|
        Index Cond: (fourids = 1)                                                                                         |
        Filter: (uniqueids = 4000)                                                                                        |
        Rows Removed by Filter: 333333                                                                                    |
Planning Time: 0.107 ms                                                                                                   |
Execution Time: 46.859 ms                                                                                                 |
*/

英文:

I have created the below example and do not understand why the planner does not use index i2 for the query. As can be seen in pg_stats, it understands that column uniqueIds contains unique values. it also understands that column fourOtherIds contains only 4 different values. Shouldn't a search of index i2 then be by far the fastest way? Looking for uniqueIds in only four different index leaves of fourOtherIds? What is wrong with my understanding of how an index works? Why does it think using i1 makes more sense here, even though it has to filter out 333.333 rows? In my understanding it should use i2 to find the one row (or few rows, as there is no unique constraint) that has uniqueIds 4000 first and then apply where fourIds = 1 as a filter.

create table t (fourIds int, uniqueIds int,fourOtherIds int);
insert into t ( select 1,*,5 from generate_series(1      ,1000000));
insert into t ( select 2,*,6 from generate_series(1000001,2000000));
insert into t ( select 3,*,7 from generate_series(2000001,3000000));
insert into t ( select 4,*,8 from generate_series(3000001,4000000));
create index i1 on t (fourIds);
create index i2 on t (fourOtherIds,uniqueIds);
analyze t;
select n_distinct,attname from pg_stats where tablename = &#39;t&#39;;
/* 
n_distinct|attname     |
----------+------------+
       4.0|fourids     |
      -1.0|uniqueids   |
       4.0|fourotherids|
*/
explain analyze select * from t where fourIds = 1 and uniqueIds = 4000;
/*
QUERY PLAN                                                                                                                |
--------------------------------------------------------------------------------------------------------------------------+
Gather  (cost=1000.43..22599.09 rows=1 width=12) (actual time=0.667..46.818 rows=1 loops=1)                               |
  Workers Planned: 2                                                                                                      |
  Workers Launched: 2                                                                                                     |
  -&gt;  Parallel Index Scan using i1 on t  (cost=0.43..21598.99 rows=1 width=12) (actual time=25.227..39.852 rows=0 loops=3)|
        Index Cond: (fourids = 1)                                                                                         |
        Filter: (uniqueids = 4000)                                                                                        |
        Rows Removed by Filter: 333333                                                                                    |
Planning Time: 0.107 ms                                                                                                   |
Execution Time: 46.859 ms                                                                                                 |
*/

答案1

得分: 1

并非每种可能的优化都已经实施。您正在寻找一种变体，称为索引跳过扫描，也称为松散索引扫描。PostgreSQL并不会自动实现这些功能（至少目前还没有，尽管有人曾经在研究）。此外，我认为有第三方扩展或分支，比如citus，可能已经实现了这一功能。您可以使用递归CTE来自行模拟，但这可能会相当繁琐。【1】【1】:https://wiki.postgresql.org/wiki/Loose_indexscan

英文:

Not every conceivable optimization has been implemented. You are looking for a variant of an index skip scan AKA a loose index scan. PostgreSQL does not automatically implement those (yet--people were working on it but I don't know if they still are. Also, I think I've read that one of the 3rd party extensions/forks, citus maybe, has implemented it). You can emulate one yourself using a recursive CTE, but that would be quite annoying to do.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

第二位置的限制 – 索引未使用 – 为什么？

问题

答案1

如何在PostgreSQL中正确创建JSON？

当在lib/pq中进行观察时，只会发生错误。

PostgreSQL ALL(A) <@ ANY(B)

如何仅使用它们共享的列合并两个表格。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论