Optimal query to PostgreSQL and complex index on 3 columns, when 2 columns have static values and 3-rd one uses operator IN

huangapple go评论65阅读模式
英文:

Optimal query to PostgreSQL and complex index on 3 columns, when 2 columns have static values and 3-rd one uses operator IN

问题

以下是已翻译的内容:

"有一个带有复合索引(col1、col2、col3)的表。这个表中有大量的数据。想要构建一个查询,例如 WHERE col1 = 2 AND col2 = 12 AND col3 IN (1, 2, 3, ..., 40),是否有一种方法可以完全利用索引(包括这3列)?

当我尝试以下SQL查询:

SELECT *
FROM table t
WHERE t.col1 = 2
    AND t.col2 = 12
    AND t.col3 IN (1, 2, 3, ..., 40)

Postgres查询规划器会在(col1、col2)上进行索引扫描,然后使用SeqScan逐个过滤出符合col3 IN (1, 2, 3, ..., 40)条件的40万行数据。

如果我尝试以下SQL查询:

SELECT *
FROM table t
WHERE (col1, col2, col3) IN VALUES (2, 12, 1), (2, 12, 2), (2, 12, 3), ... ,(2, 12, 40)

会出现错误:

临时文件大小超出了temp_file_limit

所以速度很慢。是否有一种方法可以让Postgres以某种方式使用复合索引来处理这3列?"

英文:

There's a table with compound index (col1, col2, col3). There's a lot of data in this table.
Want to build query for example WHERE col1 = 2 AND col2 = 12 AND col3 IN (1, 2, 3, ..., 40)
Is there a way to use index fully (with 3 columns)?

When I'm trying

SELECT *
FROM table t
WHERE t.col1 = 2
    AND t.col2 = 12
    AND t.col3 IN (1, 2, 3, ..., 40)

Postgres planner makes index scan on (col1, col2) and then uses SeqScan to filter one by one 400k of rows with col3 IN (1, 2, 3, ..., 40)

If I try

SELECT *
FROM table t
WHERE (col1, col2, col3) IN VALUES (2, 12, 1), (2, 12, 2), (2, 12, 3), ... ,(2, 12, 40)

it gives error:

> temporary file size exceeds temp_file_limit

So it works slow. Is there a way to make postgres use somehow compound index for 3 columns?

答案1

得分: 0

你可以尝试将col3的可能值加载到一个真正的表中,然后将查询重写如下:

SELECT t1.*
FROM yourTable t1
WHERE t1.col1 = 2 AND
      t1.col2 = 12 AND
      EXISTS (
          SELECT 1
          FROM table2 t2
          WHERE t2.col3 = t1.col3
      );

这假设table2具有以下结构:

table2:
col3
1
2
3
...
40

table1可能可以在(col1, col2, col3)上使用索引。还应在table2 (col3)上放置索引,以确保快速查找。

英文:

You could try loading the col3 possible values into a bona fide table and then rewriting the query to the following:

<!-- language: sql -->

SELECT t1.*
FROM yourTable t1
WHERE t1.col1 = 2 AND
      t1.col2 = 12 AND
      EXISTS (
          SELECT 1
          FROM table2 t2
          WHERE t2.col3 = t1.col3
      );

This assumes that table2 has the following structure:

table2:
col3
1
2
3
...
40

table1 might be able to use an index on (col1, col2, col3). An index should also be placed on table2 (col3), to ensure rapid lookups.

答案2

得分: 0

根据您的评论,看起来我们可以通过在 (col1, col2, col3) = (arg1, arg2, arg3) 上进行显式连接来强制使用索引。

我不知道您是如何调用这个查询的,但如果从支持通过数据库驱动程序传递 int[] 类型的主机语言调用,我的查询将如下所示:

with invars as (
  select 2 as c1val, 12 as c2val,
         array[1, 2, 3, 4, 5, 6, 40] as c3vals  
), search_tuples as (
  select i.c1val, i.c2val, u.c3val
    from invars i
         cross join lateral unnest(i.c3vals) as u(c3val)
)
select t.*
  from search_tuples s
       join table1 t 
    on (t.col1, t.col2, t.col3) = (s.c1val, s.c2val, s.c3val);

具有随机测试记录和 explain 的可工作演示

英文:

Based on your comment, it looks like we can force use of the index through an explicit join on (col1, col2, col3) = (arg1, arg2, arg3).

I don't know how you are calling this query, but if called from a host language that allows passing an int[] type through the database driver, my query would look like this:

with invars as (
  select 2 as c1val, 12 as c2val,
         array[1, 2, 3, 4, 5, 6, 40] as c3vals  
), search_tuples as (
  select i.c1val, i.c2val, u.c3val
    from invars i
         cross join lateral unnest(i.c3vals) as u(c3val)
)
select t.*
  from search_tuples s
       join table1 t 
    on (t.col1, t.col2, t.col3) = (s.c1val, s.c2val, s.c3val);

A working fiddle with random test records and explain

huangapple
  • 本文由 发表于 2023年7月27日 22:23:26
  • 转载请务必保留本文链接:https://go.coder-hub.com/76780700.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定