英文:
Optimal query to PostgreSQL and complex index on 3 columns, when 2 columns have static values and 3-rd one uses operator IN
问题
以下是已翻译的内容:
"有一个带有复合索引(col1、col2、col3)的表。这个表中有大量的数据。想要构建一个查询,例如 WHERE col1 = 2 AND col2 = 12 AND col3 IN (1, 2, 3, ..., 40),是否有一种方法可以完全利用索引(包括这3列)?
当我尝试以下SQL查询:
SELECT *
FROM table t
WHERE t.col1 = 2
AND t.col2 = 12
AND t.col3 IN (1, 2, 3, ..., 40)
Postgres查询规划器会在(col1、col2)上进行索引扫描,然后使用SeqScan逐个过滤出符合col3 IN (1, 2, 3, ..., 40)
条件的40万行数据。
如果我尝试以下SQL查询:
SELECT *
FROM table t
WHERE (col1, col2, col3) IN VALUES (2, 12, 1), (2, 12, 2), (2, 12, 3), ... ,(2, 12, 40)
会出现错误:
临时文件大小超出了temp_file_limit
所以速度很慢。是否有一种方法可以让Postgres以某种方式使用复合索引来处理这3列?"
英文:
There's a table with compound index (col1, col2, col3). There's a lot of data in this table.
Want to build query for example WHERE col1 = 2 AND col2 = 12 AND col3 IN (1, 2, 3, ..., 40)
Is there a way to use index fully (with 3 columns)?
When I'm trying
SELECT *
FROM table t
WHERE t.col1 = 2
AND t.col2 = 12
AND t.col3 IN (1, 2, 3, ..., 40)
Postgres planner makes index scan on (col1, col2) and then uses SeqScan to filter one by one 400k of rows with col3 IN (1, 2, 3, ..., 40)
If I try
SELECT *
FROM table t
WHERE (col1, col2, col3) IN VALUES (2, 12, 1), (2, 12, 2), (2, 12, 3), ... ,(2, 12, 40)
it gives error:
> temporary file size exceeds temp_file_limit
So it works slow. Is there a way to make postgres use somehow compound index for 3 columns?
答案1
得分: 0
你可以尝试将col3
的可能值加载到一个真正的表中,然后将查询重写如下:
SELECT t1.*
FROM yourTable t1
WHERE t1.col1 = 2 AND
t1.col2 = 12 AND
EXISTS (
SELECT 1
FROM table2 t2
WHERE t2.col3 = t1.col3
);
这假设table2
具有以下结构:
table2:
col3
1
2
3
...
40
table1
可能可以在(col1, col2, col3)
上使用索引。还应在table2 (col3)
上放置索引,以确保快速查找。
英文:
You could try loading the col3
possible values into a bona fide table and then rewriting the query to the following:
<!-- language: sql -->
SELECT t1.*
FROM yourTable t1
WHERE t1.col1 = 2 AND
t1.col2 = 12 AND
EXISTS (
SELECT 1
FROM table2 t2
WHERE t2.col3 = t1.col3
);
This assumes that table2
has the following structure:
table2:
col3
1
2
3
...
40
table1
might be able to use an index on (col1, col2, col3)
. An index should also be placed on table2 (col3)
, to ensure rapid lookups.
答案2
得分: 0
根据您的评论,看起来我们可以通过在 (col1, col2, col3) = (arg1, arg2, arg3)
上进行显式连接来强制使用索引。
我不知道您是如何调用这个查询的,但如果从支持通过数据库驱动程序传递 int[]
类型的主机语言调用,我的查询将如下所示:
with invars as (
select 2 as c1val, 12 as c2val,
array[1, 2, 3, 4, 5, 6, 40] as c3vals
), search_tuples as (
select i.c1val, i.c2val, u.c3val
from invars i
cross join lateral unnest(i.c3vals) as u(c3val)
)
select t.*
from search_tuples s
join table1 t
on (t.col1, t.col2, t.col3) = (s.c1val, s.c2val, s.c3val);
英文:
Based on your comment, it looks like we can force use of the index through an explicit join on (col1, col2, col3) = (arg1, arg2, arg3)
.
I don't know how you are calling this query, but if called from a host language that allows passing an int[]
type through the database driver, my query would look like this:
with invars as (
select 2 as c1val, 12 as c2val,
array[1, 2, 3, 4, 5, 6, 40] as c3vals
), search_tuples as (
select i.c1val, i.c2val, u.c3val
from invars i
cross join lateral unnest(i.c3vals) as u(c3val)
)
select t.*
from search_tuples s
join table1 t
on (t.col1, t.col2, t.col3) = (s.c1val, s.c2val, s.c3val);
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论