英文:
Best practice for pagination big table
问题
我有一个相当大的表,约有100百万行:
create table transfer
(
id bigint not null primary key,
db_updated_at timestamp with time zone,
...
);
我需要按照db_updated_at
降序和id
降序的顺序迭代它。有一个带有REST API的应用程序,它生成类似这样的查询:
SELECT * FROM transfer
WHERE "db_updated_at" < '2022-11-18 23:38:44+03' OR (db_updated_at = '2022-11-18 23:38:44+03' and id < 154998555017734)
ORDER BY "db_updated_at" DESC, "id" DESC LIMIT 100
但它非常慢(在解释计划中我看到这个):
Limit (cost=0.56..26.92 rows=10 width=230) (actual time=182494.092..182494.273 rows=10 loops=1)
-> Index Scan using transfer_db_updated_at_id_desc on transfer (cost=0.56..55717286.28 rows=21142503 width=230) (actual time=182494.089..182494.266 rows=10 loops=1)
Filter: ((db_updated_at < '2022-11-18 20:38:44+00'::timestamp with time zone) OR ((db_updated_at = '2022-11-18 20:38:44+00'::timestamp with time zone) AND (id < '154998555017734'::bigint)))
Rows Removed by Filter: 16040385
Planning Time: 0.364 ms
Execution Time: 182494.312 ms
在有两个字段排序的情况下,迭代表的最佳实践是什么?我不能仅按id
迭代,因为我需要一个特定的时间段。
英文:
I have a pretty big table ~100m rows:
create table transfer
(
id bigint not null primary key,
db_updated_at timestamp with time zone,
...
);
I need to iterate it ordering by db_updated_at desc, id desc.
There's application with REST api, which generates query like this:
SELECT * FROM transfer
WHERE "db_updated_at" < '2022-11-18 23:38:44+03' OR (db_updated_at = '2022-11-18 23:38:44+03' and id < 154998555017734)
ORDER BY "db_updated_at" DESC, "id" DESC LIMIT 100
But it's extremly slow ( in explain I see this:
Limit (cost=0.56..26.92 rows=10 width=230) (actual time=182494.092..182494.273 rows=10 loops=1)
-> Index Scan using transfer_db_updated_at_id_desc on transfer (cost=0.56..55717286.28 rows=21142503 width=230) (actual time=182494.089..182494.266 rows=10 loops=1)
Filter: ((db_updated_at < '2022-11-18 20:38:44+00'::timestamp with time zone) OR ((db_updated_at = '2022-11-18 20:38:44+00'::timestamp with time zone) AND (id < '154998555017734'::bigint)))
Rows Removed by Filter: 16040385
Planning Time: 0.364 ms
Execution Time: 182494.312 ms
What it the best practice to iterate table with 2 field ordering? I can't iterate only by id because I need s specific timeframe
答案1
得分: 0
SELECT * FROM transfer
WHERE (db_updated_at, id) < ('2022-11-18 23:38:44+03', 154998555017734)
ORDER BY "db_updated_at" DESC, "id" DESC LIMIT 100;
英文:
Write it as a tuple comparison rather a complex comparison of multiple scalars:
SELECT * FROM transfer
WHERE (db_updated_at,id) < ('2022-11-18 23:38:44+03',154998555017734)
ORDER BY "db_updated_at" DESC, "id" DESC LIMIT 100;
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论