第二位置的限制 – 索引未使用 – 为什么?

huangapple go评论59阅读模式
英文:

restriction in second position - index not used - why?

问题

我创建了以下示例,但不明白为什么计划程序不使用索引 i2 来执行查询。如pg_stats中所示,它知道列 uniqueIds 包含唯一值。它还知道列 fourOtherIds 仅包含4个不同的值。那么使用索引 i2 搜寻不是最快的方式吗?只在 fourOtherIds 的四个不同索引叶子中查找 uniqueIds?我对索引工作原理的理解有什么问题?为什么它认为在这里使用 i1 更有意义,尽管需要过滤掉333,333行?在我的理解中,应该使用 i2 首先找到具有 uniqueIds 4000的一行(或几行,因为没有唯一约束),然后将 fourIds = 1 作为过滤条件应用。

创建表tfourIds intuniqueIds intfourOtherIds int;
插入到t(选择1*5 from generate_series11000000));
插入到t(选择2*6 from generate_series10000012000000));
插入到t(选择3*7 from generate_series20000013000000));
插入到t(选择4*8 from generate_series30000014000000));
创建索引i1在tfourIds;
创建索引i2在tfourOtherIdsuniqueIds;
分析t;
选择n_distinctattname from pg_stats where tablename = 't';
/* 
n_distinct|attname     |
----------+------------+
       4.0|fourids     |
      -1.0|uniqueids   |
       4.0|fourotherids|
*/
解释分析选择* from t where fourIds = 1 and uniqueIds = 4000;
/*
QUERY PLAN                                                                                                                |
--------------------------------------------------------------------------------------------------------------------------+
Gather  (cost=1000.43..22599.09 rows=1 width=12) (actual time=0.667..46.818 rows=1 loops=1)                               |
  Workers Planned: 2                                                                                                      |
  Workers Launched: 2                                                                                                     |
  ->  Parallel Index Scan using i1 on t  (cost=0.43..21598.99 rows=1 width=12) (actual time=25.227..39.852 rows=0 loops=3)|
        Index Cond: (fourids = 1)                                                                                         |
        Filter: (uniqueids = 4000)                                                                                        |
        Rows Removed by Filter: 333333                                                                                    |
Planning Time: 0.107 ms                                                                                                   |
Execution Time: 46.859 ms                                                                                                 |
*/
英文:

I have created the below example and do not understand why the planner does not use index i2 for the query. As can be seen in pg_stats, it understands that column uniqueIds contains unique values. it also understands that column fourOtherIds contains only 4 different values. Shouldn't a search of index i2 then be by far the fastest way? Looking for uniqueIds in only four different index leaves of fourOtherIds? What is wrong with my understanding of how an index works? Why does it think using i1 makes more sense here, even though it has to filter out 333.333 rows? In my understanding it should use i2 to find the one row (or few rows, as there is no unique constraint) that has uniqueIds 4000 first and then apply where fourIds = 1 as a filter.

create table t (fourIds int, uniqueIds int,fourOtherIds int);
insert into t ( select 1,*,5 from generate_series(1      ,1000000));
insert into t ( select 2,*,6 from generate_series(1000001,2000000));
insert into t ( select 3,*,7 from generate_series(2000001,3000000));
insert into t ( select 4,*,8 from generate_series(3000001,4000000));
create index i1 on t (fourIds);
create index i2 on t (fourOtherIds,uniqueIds);
analyze t;
select n_distinct,attname from pg_stats where tablename = 't';
/* 
n_distinct|attname     |
----------+------------+
       4.0|fourids     |
      -1.0|uniqueids   |
       4.0|fourotherids|
*/
explain analyze select * from t where fourIds = 1 and uniqueIds = 4000;
/*
QUERY PLAN                                                                                                                |
--------------------------------------------------------------------------------------------------------------------------+
Gather  (cost=1000.43..22599.09 rows=1 width=12) (actual time=0.667..46.818 rows=1 loops=1)                               |
  Workers Planned: 2                                                                                                      |
  Workers Launched: 2                                                                                                     |
  ->  Parallel Index Scan using i1 on t  (cost=0.43..21598.99 rows=1 width=12) (actual time=25.227..39.852 rows=0 loops=3)|
        Index Cond: (fourids = 1)                                                                                         |
        Filter: (uniqueids = 4000)                                                                                        |
        Rows Removed by Filter: 333333                                                                                    |
Planning Time: 0.107 ms                                                                                                   |
Execution Time: 46.859 ms                                                                                                 |
*/

答案1

得分: 1

并非每种可能的优化都已经实施。您正在寻找一种变体,称为索引跳过扫描,也称为松散索引扫描。PostgreSQL并不会自动实现这些功能(至少目前还没有,尽管有人曾经在研究)。此外,我认为有第三方扩展或分支,比如citus,可能已经实现了这一功能。您可以使用递归CTE来自行模拟,但这可能会相当繁琐。【1】【1】:https://wiki.postgresql.org/wiki/Loose_indexscan

英文:

Not every conceivable optimization has been implemented. You are looking for a variant of an index skip scan AKA a loose index scan. PostgreSQL does not automatically implement those (yet--people were working on it but I don't know if they still are. Also, I think I've read that one of the 3rd party extensions/forks, citus maybe, has implemented it). You can emulate one yourself using a recursive CTE, but that would be quite annoying to do.

huangapple
  • 本文由 发表于 2023年2月8日 22:26:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/75387201.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定