PostgreSQL最佳实践 – 为多个消费者创建队列表

huangapple go评论56阅读模式
英文:

postgres best practice - queuing table for many consumers

问题

最佳实践是如何在PostgreSQL中实现以下内容:

  • 一个系统将任务放入队列,每分钟约5000个任务。这些任务携带不同大小的有效负载(从几KB到几MB不等)。
  • 然后,多个消费者会拾取这些任务。首先,他们将任务标记为已取(进行中)。然后,在任务完成后,他们将任务从队列中删除。

我们目前的做法是有一个名为'tasks'的表,其中有一个列'status'和一个列'payload'。每分钟向此表插入5000个任务。然后,这5000条记录会分别更新为status=in-progress,因为每个任务都会单独处理。然后,当任务完成时,这5000条记录会再次单独删除。我们在表上有几个索引,有效负载存储在表中(尽管有时被分段)。由于这个原因,我们看到了非常高数量的WALWrite,这是有道理的。我们不认为禁用同步提交是一个选项。

因此,我想知道在PostgreSQL中如何优化这样一个排队系统?是否应该使用多个表而不是一个表?是否应该使用仅插入表?是否应该取消索引(用于HOT更新)?有效负载是否应该存储在单独的表中,而不是被更新,并可能异步删除?

英文:

What is the best practice to implement the following in postgres:

  • One system puts tasks into a queue, about 5.000 per minute. The tasks carry a payload of varying sizes (a few KB to several MB)
  • Several consumers now pick up these tasks. First, they mark a task as taken (in-progress). Then, when the task is done they remove the task from the queue.

The way we are doing it is having a table 'tasks' that has a column 'status' and a column 'payload'. Into this table 5000 tasks are inserted per minute. These 5000 records are then - each individually, as each task is processed individually - updated to status=in-progress. Then, these 5000 records are deleted - again individually - when the tasks are done. We have several indexes on the table, the payload sits in the table (albeit sometimes toasted). We are seeing a very high amount of WALWrite because of this, which makes sense. We do not see disabling synchronous commit as an option.

So I am curious how would such a queuing system optimally look in postgres? Use several tables instead of one? Insert only tables? No indexes (for HOT-updates)? Payload in separate table that is not updated and maybe deleted asynchronously?

答案1

得分: 2

你有任务和你有一个队列 - 这些看起来像是两个表格。你的任务有它的id和有效负载数据等。你的队列有任务id、时间戳、状态等。

如果你想要某种形式的“自动同步”这些内容,你可以设置触发器来将任何新任务加入队列,并设置一个ON DELETE外键,当任务被删除时移除队列项。

英文:

You have tasks and you have a queue - these look like two tables to me. Your task has its id and payload data etc. Your queue has the task id, timestamps, status etc.

If you want some sort of "automagical" synchronisation of this stuff, you can have triggers to queue any new tasks and an ON DELETE foreign key that removes the queue item when the task is deleted.

huangapple
  • 本文由 发表于 2023年7月6日 21:37:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/76629455.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定