英文:
Sharded concurrency for Cloud Tasks
问题
Google Cloud Tasks队列具有maxConcurrentDispatches
参数,该参数指定队列可以同时执行的任务数。
现在假设我对用户执行了一个操作,并且可以同时执行任意数量的操作(但每个用户一次只能执行一个操作,例如为了减少数据库争用)。
就我所想象的而言,我有两个选项:
- 指定
maxConcurrentDispatches: 1
,以便一次只能运行一个操作。但是对于成千上万的用户来说,这将不具备可伸缩性,特别是如果是长时间运行的操作。操作吞吐量将太慢。 - 创建许多不同的队列,每个用户一个队列,每个队列都具有
maxConcurrentDispatches: 1
,以便可以同时执行任意数量的操作,但每个用户一个队列。但我担心这也不会很好地扩展。维护数十万甚至数百万个队列将成为负担(如果我需要暂停所有队列或更改其配置,会发生什么?)。而且我无法设置总并发性的上限。Cloud Tasks是否可以支持潜在的无限队列数量?
我应该如何在Google Cloud平台上实现这种场景?是否有其他消息传递服务可以使用,或者是我正在忽略的设计决策?
英文:
Google Cloud Tasks queues have a maxConcurrentDispatches
parameter that specifies the number of tasks that can execute concurrently for a queue.
Now let's say I have an operation is performed on a user, and any number (with an upper limit) of operations can execute at the same time, but only one at a time per user (e.g. to reduce database contention).
As far as I can imagine, I have two options
- Specify
maxConcurrentDispatches: 1
so that only one operation can run at a time. However with many thousands of users, this will not scale, especially if it is a long-running operation. The operation throughput will be too slow. - Create many different queues, one for each user, each with
maxConcurrentDispatches: 1
, so that any number can execute at the same time but one for each user. However I'm concerned that this too will not scale well. It will become a burden to maintain hundreds of thousands if not millions of queues (what happens if I need to pause all the queues, or change their configuration?). Also I cannot set an upper bound on overall concurrency. Can Cloud Tasks even support a potentially unlimited number of queues?
How should I implement this scenario on Google Cloud Platform? Is there another messaging service that could use, or a design decision that I am overlooking?
答案1
得分: 1
你可以使用PubSub而不是Cloud Task。配置HTTP推送订阅以执行对端点的调用(就像Cloud Task今天所做的那样)。
关键在于使用一个排序键。将UserID设置为排序键,这样直到当前消息未被确认之前,不会传递给该UserID的下一条消息。
只要它们没有相同的排序键,就会同时传递许多消息。
“问题”(如果有的话):消息将按照它们被创建的顺序排队(先进先出)。
英文:
You could use PubSub instead of Cloud Task. Configure the HTTP push subscription to perform the call to the endpoint (as Cloud Task do today).
The trick here is to use an ordering key. Set the UserID as ordering key and like that the next message for this userID won't be delivered until the current one has not been acknowledged.
Many message will be delivered in the same time as long as they don't have the same ordering key
The "problem" (if it's one): the message will be unqueued in the same order as they are created (FIFO)
答案2
得分: 0
在创建任务时,您可以指定name
属性,再次使用相同名称提交任务将导致ALREADY_EXISTS
错误。因此,如果您将用户的ID用作任务名称,您可以大致获得每个用户的一个任务。
这样做有一些缺点。当您指定名称时,任务分发可能会增加延迟(因为需要检查重复名称)。并且在前一个任务成功完成后,您无法在一段时间内重新使用任务名称。我不知道确切的时间,但是在几分钟的范围内,请查阅文档。
对于后一个问题的解决方法可能是为每个用户使用某种计数器并将其添加到任务名称中,或者可能使用一些时间戳(即如果用户每分钟可以发送一个任务,那么使用userID
+ currentMinuteTimestamp()
的组合)。
英文:
You can specify name
attribute when creating a task, and submitting task with identical name again will fail with ALREADY_EXISTS
error. So if you will use ID of the user in the task name, you can roughly get one task per user.
This has however a few disadvantages. When you specify the name, the tasks dispatching might have increased latency (because duplicate names need to be checked). And you cant reuse the task name for some time after the previous task has finished successfully. I dont know the exact time, but it's in scale of minutes, please check the docs.
Workaround for the latter issue could be some kind of counter for each user and add it to the task name, or maybe use some timestamp (ie. if user can send one task a minute then use combination of userID
+ currentMinuteTimestamp()
?
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论