英文:
restriction in second position - index not used - why?
问题
我创建了以下示例,但不明白为什么计划程序不使用索引 i2
来执行查询。如pg_stats
中所示,它知道列 uniqueIds
包含唯一值。它还知道列 fourOtherIds
仅包含4个不同的值。那么使用索引 i2
搜寻不是最快的方式吗?只在 fourOtherIds
的四个不同索引叶子中查找 uniqueIds
?我对索引工作原理的理解有什么问题?为什么它认为在这里使用 i1
更有意义,尽管需要过滤掉333,333行?在我的理解中,应该使用 i2
首先找到具有 uniqueIds
4000的一行(或几行,因为没有唯一约束),然后将 fourIds = 1
作为过滤条件应用。
创建表t(fourIds int,uniqueIds int,fourOtherIds int);
插入到t(选择1,*,5 from generate_series(1,1000000));
插入到t(选择2,*,6 from generate_series(1000001,2000000));
插入到t(选择3,*,7 from generate_series(2000001,3000000));
插入到t(选择4,*,8 from generate_series(3000001,4000000));
创建索引i1在t(fourIds);
创建索引i2在t(fourOtherIds,uniqueIds);
分析t;
选择n_distinct,attname from pg_stats where tablename = 't';
/*
n_distinct|attname |
----------+------------+
4.0|fourids |
-1.0|uniqueids |
4.0|fourotherids|
*/
解释分析选择* from t where fourIds = 1 and uniqueIds = 4000;
/*
QUERY PLAN |
--------------------------------------------------------------------------------------------------------------------------+
Gather (cost=1000.43..22599.09 rows=1 width=12) (actual time=0.667..46.818 rows=1 loops=1) |
Workers Planned: 2 |
Workers Launched: 2 |
-> Parallel Index Scan using i1 on t (cost=0.43..21598.99 rows=1 width=12) (actual time=25.227..39.852 rows=0 loops=3)|
Index Cond: (fourids = 1) |
Filter: (uniqueids = 4000) |
Rows Removed by Filter: 333333 |
Planning Time: 0.107 ms |
Execution Time: 46.859 ms |
*/
英文:
I have created the below example and do not understand why the planner does not use index i2
for the query. As can be seen in pg_stats
, it understands that column uniqueIds
contains unique values. it also understands that column fourOtherIds
contains only 4 different values. Shouldn't a search of index i2
then be by far the fastest way? Looking for uniqueIds
in only four different index leaves of fourOtherIds
? What is wrong with my understanding of how an index works? Why does it think using i1
makes more sense here, even though it has to filter out 333.333 rows? In my understanding it should use i2
to find the one row (or few rows, as there is no unique constraint) that has uniqueIds
4000 first and then apply where fourIds = 1
as a filter.
create table t (fourIds int, uniqueIds int,fourOtherIds int);
insert into t ( select 1,*,5 from generate_series(1 ,1000000));
insert into t ( select 2,*,6 from generate_series(1000001,2000000));
insert into t ( select 3,*,7 from generate_series(2000001,3000000));
insert into t ( select 4,*,8 from generate_series(3000001,4000000));
create index i1 on t (fourIds);
create index i2 on t (fourOtherIds,uniqueIds);
analyze t;
select n_distinct,attname from pg_stats where tablename = 't';
/*
n_distinct|attname |
----------+------------+
4.0|fourids |
-1.0|uniqueids |
4.0|fourotherids|
*/
explain analyze select * from t where fourIds = 1 and uniqueIds = 4000;
/*
QUERY PLAN |
--------------------------------------------------------------------------------------------------------------------------+
Gather (cost=1000.43..22599.09 rows=1 width=12) (actual time=0.667..46.818 rows=1 loops=1) |
Workers Planned: 2 |
Workers Launched: 2 |
-> Parallel Index Scan using i1 on t (cost=0.43..21598.99 rows=1 width=12) (actual time=25.227..39.852 rows=0 loops=3)|
Index Cond: (fourids = 1) |
Filter: (uniqueids = 4000) |
Rows Removed by Filter: 333333 |
Planning Time: 0.107 ms |
Execution Time: 46.859 ms |
*/
答案1
得分: 1
并非每种可能的优化都已经实施。您正在寻找一种变体,称为索引跳过扫描,也称为松散索引扫描。PostgreSQL并不会自动实现这些功能(至少目前还没有,尽管有人曾经在研究)。此外,我认为有第三方扩展或分支,比如citus,可能已经实现了这一功能。您可以使用递归CTE来自行模拟,但这可能会相当繁琐。【1】【1】:https://wiki.postgresql.org/wiki/Loose_indexscan
英文:
Not every conceivable optimization has been implemented. You are looking for a variant of an index skip scan AKA a loose index scan. PostgreSQL does not automatically implement those (yet--people were working on it but I don't know if they still are. Also, I think I've read that one of the 3rd party extensions/forks, citus maybe, has implemented it). You can emulate one yourself using a recursive CTE, but that would be quite annoying to do.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论