2023年7月17日 15:44:17go评论142阅读模式

英文:

Optimizing indexes for large PostgreSQL table

问题

Indexes:
"location_history_pkey" 主键, btree (uid, geohash)
"ix_location_history_geohash" btree (geohash)
"ix_location_history_uid" btree (uid)

但是当我根据以下约束查询数据时，即使这两列已经建立索引，查询速度非常慢（169秒！）：

EXPLAIN (analyze, verbose, buffers) SELECT * FROM location_history WHERE uid = 123 AND geohash >= 'hzzzzzzzzzz';

查询计划：
索引扫描使用location_history_pkey在public.location_history上（成本=0.71..84735.92行=22438宽度=29）（实际时间=1.434..114713.751行=189319循环=1）
输出：created_at，updated_at，uid，geohash，is_from_zenly
索引条件：（（location_history.uid = 123）并且（（location_history.geohash）:: text > = 'hzzzzzzzzzz':: text））
缓冲区：共享命中= 5822读取= 179864弄脏= 132写入= 19338
I / O时间：读取= 111433.601写入= 2564.930
查询标识符：-7646491345250917333
规划时间：0.103毫秒
执行时间：114746.908毫秒

pgstattuple()的结果如下：

SELECT * FROM pgstattuple(416708);
  table_len   | tuple_count |  tuple_len   | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space | free_percent
--------------+-------------+--------------+---------------+------------------+----------------+--------------------+------------+--------------
 812873588736 | 15570800464 | 685124233392 |         84.28 |             1506 |          66264 |                  0 |  394834592 |         0.05

膨胀比率约为34％：

SELECT 100-(pgstatindex('location_history_pkey')).avg_leaf_density;

      ?column?
--------------------
 34.260000000000005

您能分享加快查询速度的任何提示吗？谢谢。

英文:

I have a large table (location_history) with uid and geohash, and they are both primary key and indexed.

Indexes:
    &quot;location_history_pkey&quot; PRIMARY KEY, btree (uid, geohash)
    &quot;ix_location_history_geohash&quot; btree (geohash)
    &quot;ix_location_history_uid&quot; btree (uid)

But when I query the data by the following constraints, it runs really slow (169 seconds!) even the two columns are already indexed:

EXPLAIN (analyze, verbose, buffers) SELECT * FROM location_history WHERE uid = 123 AND geohash &gt;= &#39;hzzzzzzzzzz&#39;;
                                                                            QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using location_history_pkey on public.location_history  (cost=0.71..84735.92 rows=22438 width=29) (actual time=1.434..114713.751 rows=189319 loops=1)
   Output: created_at, updated_at, uid, geohash, is_from_zenly
   Index Cond: ((location_history.uid = 123) AND ((location_history.geohash)::text &gt;= &#39;hzzzzzzzzzz&#39;::text))
   Buffers: shared hit=5822 read=179864 dirtied=132 written=19338
   I/O Timings: read=111433.601 write=2564.930
 Query Identifier: -7646491345250917333
 Planning Time: 0.103 ms
 Execution Time: 114746.908 ms
(8 rows)

And here is the result from pgstattuple():

SELECT * FROM pgstattuple(416708);
  table_len   | tuple_count |  tuple_len   | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space | free_percent
--------------+-------------+--------------+---------------+------------------+----------------+--------------------+------------+--------------
 812873588736 | 15570800464 | 685124233392 |         84.28 |             1506 |          66264 |                  0 |  394834592 |         0.05
(1 row)

The bloat_ratio is about 34%:

SELECT 100-(pgstatindex(&#39;location_history_pkey&#39;)).avg_leaf_density;

      ?column?
--------------------
 34.260000000000005
(1 row)

Could you share any tips to speed up the query? Thanks.

答案1

得分: 0

你可以通过使用仅索引扫描来获得显著的改进，方法是在索引中包含查询所需的所有列，显然包括(uid、geohash、is_from_zenly、created_at、updated_at)。在这种情况下，它只需要访问索引页面，而不是（随机分散的）表页面，只要表被充分清理过。你可以通过以下方式获得这个“覆盖”索引，而不是部分复制主键索引：

alter table location_history add constraint location_history_pkey primary key (uid, geohash) include (is_from_zenly, created_at, updated_at)

当然，你首先需要删除当前的主键，因此这会导致一些停机时间。或者，你可以创建新索引并接受冗余索引。

当然，如果你的实际查询集合中可能包含比你在示例中显示的更多列，这可能不是一个非常完整的解决方案。

英文:

You could get a big improvement if you had an index only scan, which you could by including in the index all the columns needed by the query, apparently (uid, geohash, is_from_zenly, created_at, updated_at). In that case it would only need to visit the index pages and not the (randomly scattered) table pages, as long as the table was well vacuumed. You could get that "covering" index without partially duplicating the primary key index by doing:

alter table location_history add constraint location_history_pkey primary key (uid, geohash) include (is_from_zenly, created_at, updated_at)

Of course you would need to drop the current primary key first, so that would entail some downtime. You could instead just create the new index and live with the redundant indexes.

Of course if your real set of queries can contain more columns than just the ones you showed in your example, this might not be a very complete solution.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

优化大型 PostgreSQL 表的索引

问题

答案1

in round() function, second thing after number to be rounded is negative what will be output

Go模板用于将切片中的查询连接起来。

如何在Snowflake中使用INSERT INTO语句创建存储过程？

在树形结构中查找路径

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论