大表分页的最佳实践

huangapple go评论58阅读模式
英文:

Best practice for pagination big table

问题

我有一个相当大的表,约有100百万行:

create table transfer
(
    id                  bigint                   not null primary key,
    db_updated_at       timestamp with time zone,
    ...
);

我需要按照db_updated_at降序和id降序的顺序迭代它。有一个带有REST API的应用程序,它生成类似这样的查询:

SELECT * FROM transfer
WHERE "db_updated_at" < '2022-11-18 23:38:44+03' OR (db_updated_at = '2022-11-18 23:38:44+03' and id < 154998555017734)
ORDER BY "db_updated_at" DESC, "id" DESC LIMIT 100

但它非常慢(在解释计划中我看到这个):

Limit  (cost=0.56..26.92 rows=10 width=230) (actual time=182494.092..182494.273 rows=10 loops=1)
  ->  Index Scan using transfer_db_updated_at_id_desc on transfer  (cost=0.56..55717286.28 rows=21142503 width=230) (actual time=182494.089..182494.266 rows=10 loops=1)
        Filter: ((db_updated_at < '2022-11-18 20:38:44+00'::timestamp with time zone) OR ((db_updated_at = '2022-11-18 20:38:44+00'::timestamp with time zone) AND (id < '154998555017734'::bigint)))
        Rows Removed by Filter: 16040385
Planning Time: 0.364 ms
Execution Time: 182494.312 ms

在有两个字段排序的情况下,迭代表的最佳实践是什么?我不能仅按id迭代,因为我需要一个特定的时间段。

英文:

I have a pretty big table ~100m rows:

create table transfer
(
    id                  bigint                   not null primary key,
    db_updated_at       timestamp with time zone,
    ...
);

I need to iterate it ordering by db_updated_at desc, id desc.
There's application with REST api, which generates query like this:

SELECT * FROM transfer
WHERE &quot;db_updated_at&quot; &lt; &#39;2022-11-18 23:38:44+03&#39; OR (db_updated_at = &#39;2022-11-18 23:38:44+03&#39; and id &lt; 154998555017734)
ORDER BY &quot;db_updated_at&quot; DESC, &quot;id&quot; DESC LIMIT 100

But it's extremly slow ( in explain I see this:

Limit  (cost=0.56..26.92 rows=10 width=230) (actual time=182494.092..182494.273 rows=10 loops=1)
  -&gt;  Index Scan using transfer_db_updated_at_id_desc on transfer  (cost=0.56..55717286.28 rows=21142503 width=230) (actual time=182494.089..182494.266 rows=10 loops=1)
        Filter: ((db_updated_at &lt; &#39;2022-11-18 20:38:44+00&#39;::timestamp with time zone) OR ((db_updated_at = &#39;2022-11-18 20:38:44+00&#39;::timestamp with time zone) AND (id &lt; &#39;154998555017734&#39;::bigint)))
        Rows Removed by Filter: 16040385
Planning Time: 0.364 ms
Execution Time: 182494.312 ms

What it the best practice to iterate table with 2 field ordering? I can't iterate only by id because I need s specific timeframe

答案1

得分: 0

SELECT * FROM transfer
WHERE (db_updated_at, id) < ('2022-11-18 23:38:44+03', 154998555017734)
ORDER BY "db_updated_at" DESC, "id" DESC LIMIT 100;

英文:

Write it as a tuple comparison rather a complex comparison of multiple scalars:

SELECT * FROM transfer
WHERE (db_updated_at,id) &lt; (&#39;2022-11-18 23:38:44+03&#39;,154998555017734)
ORDER BY &quot;db_updated_at&quot; DESC, &quot;id&quot; DESC LIMIT 100;

huangapple
  • 本文由 发表于 2023年6月8日 06:02:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/76427378.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定