优化大型 PostgreSQL 表的索引

huangapple go评论58阅读模式
英文:

Optimizing indexes for large PostgreSQL table

问题

Indexes:
"location_history_pkey" 主键, btree (uid, geohash)
"ix_location_history_geohash" btree (geohash)
"ix_location_history_uid" btree (uid)

但是当我根据以下约束查询数据时,即使这两列已经建立索引,查询速度非常慢(169秒!):

EXPLAIN (analyze, verbose, buffers) SELECT * FROM location_history WHERE uid = 123 AND geohash >= 'hzzzzzzzzzz';

查询计划:
索引扫描使用location_history_pkey在public.location_history上(成本=0.71..84735.92行=22438宽度=29)(实际时间=1.434..114713.751行=189319循环=1)
输出:created_at,updated_at,uid,geohash,is_from_zenly
索引条件:((location_history.uid = 123)并且((location_history.geohash):: text > = 'hzzzzzzzzzz':: text))
缓冲区:共享命中= 5822读取= 179864弄脏= 132写入= 19338
I / O时间:读取= 111433.601写入= 2564.930
查询标识符:-7646491345250917333
规划时间:0.103毫秒
执行时间:114746.908毫秒

pgstattuple()的结果如下:

SELECT * FROM pgstattuple(416708);
  table_len   | tuple_count |  tuple_len   | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space | free_percent
--------------+-------------+--------------+---------------+------------------+----------------+--------------------+------------+--------------
 812873588736 | 15570800464 | 685124233392 |         84.28 |             1506 |          66264 |                  0 |  394834592 |         0.05

膨胀比率约为34%:

SELECT 100-(pgstatindex('location_history_pkey')).avg_leaf_density;

      ?column?
--------------------
 34.260000000000005

您能分享加快查询速度的任何提示吗?谢谢。

英文:

I have a large table (location_history) with uid and geohash, and they are both primary key and indexed.

Indexes:
    "location_history_pkey" PRIMARY KEY, btree (uid, geohash)
    "ix_location_history_geohash" btree (geohash)
    "ix_location_history_uid" btree (uid)

But when I query the data by the following constraints, it runs really slow (169 seconds!) even the two columns are already indexed:

EXPLAIN (analyze, verbose, buffers) SELECT * FROM location_history WHERE uid = 123 AND geohash >= 'hzzzzzzzzzz';
                                                                            QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using location_history_pkey on public.location_history  (cost=0.71..84735.92 rows=22438 width=29) (actual time=1.434..114713.751 rows=189319 loops=1)
   Output: created_at, updated_at, uid, geohash, is_from_zenly
   Index Cond: ((location_history.uid = 123) AND ((location_history.geohash)::text >= 'hzzzzzzzzzz'::text))
   Buffers: shared hit=5822 read=179864 dirtied=132 written=19338
   I/O Timings: read=111433.601 write=2564.930
 Query Identifier: -7646491345250917333
 Planning Time: 0.103 ms
 Execution Time: 114746.908 ms
(8 rows)

And here is the result from pgstattuple():

SELECT * FROM pgstattuple(416708);
  table_len   | tuple_count |  tuple_len   | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space | free_percent
--------------+-------------+--------------+---------------+------------------+----------------+--------------------+------------+--------------
 812873588736 | 15570800464 | 685124233392 |         84.28 |             1506 |          66264 |                  0 |  394834592 |         0.05
(1 row)

The bloat_ratio is about 34%:

SELECT 100-(pgstatindex('location_history_pkey')).avg_leaf_density;

      ?column?
--------------------
 34.260000000000005
(1 row)

Could you share any tips to speed up the query? Thanks.

答案1

得分: 0

你可以通过使用仅索引扫描来获得显著的改进,方法是在索引中包含查询所需的所有列,显然包括(uid、geohash、is_from_zenly、created_at、updated_at)。在这种情况下,它只需要访问索引页面,而不是(随机分散的)表页面,只要表被充分清理过。你可以通过以下方式获得这个“覆盖”索引,而不是部分复制主键索引:

alter table location_history add constraint location_history_pkey primary key (uid, geohash) include (is_from_zenly, created_at, updated_at)

当然,你首先需要删除当前的主键,因此这会导致一些停机时间。或者,你可以创建新索引并接受冗余索引。

当然,如果你的实际查询集合中可能包含比你在示例中显示的更多列,这可能不是一个非常完整的解决方案。

英文:

You could get a big improvement if you had an index only scan, which you could by including in the index all the columns needed by the query, apparently (uid, geohash, is_from_zenly, created_at, updated_at). In that case it would only need to visit the index pages and not the (randomly scattered) table pages, as long as the table was well vacuumed. You could get that "covering" index without partially duplicating the primary key index by doing:

alter table location_history add constraint location_history_pkey primary key (uid, geohash) include (is_from_zenly, created_at, updated_at)

Of course you would need to drop the current primary key first, so that would entail some downtime. You could instead just create the new index and live with the redundant indexes.

Of course if your real set of queries can contain more columns than just the ones you showed in your example, this might not be a very complete solution.

huangapple
  • 本文由 发表于 2023年7月17日 15:44:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/76702395.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定