由于多个 Pod 在同一时间内多次重启,限制任务运行多次。

huangapple go评论95阅读模式
英文:

restricting tasks to run multiple times due to multiple pod restarts at same time

问题

我是新手使用Kubernetes,在一个问题上遇到了困难。

假设我有多个Pod,并且有一些任务正在运行。突然,当Pod由于某种原因停止时,我通过捕获SIGTERM信号或使用terminationGracePeriod将任务的状态保存在某个数据库中(已终止但未完成)。
所以假设我有10个已终止的任务,我希望在Pod重新启动时重新启动这些任务。如果多个Pod重新启动,它们都将从数据库中获取已终止的任务,将状态设置为"In Progress",并开始执行任务。因此,任务不会只启动一次,而是会多次启动,因为多个Pod发现它已终止。我不想在数据库上应用锁定,因为这会减慢我的代码。那么,我如何限制只有一个Pod获取已终止的任务并且只启动一次。

顺便说一下,我正在尝试在Golang中实现任务的重新启动。

英文:

I am new to Kubernetes and have been stuck at one point.

Lets say, I have multiple pods and I have some tasks running. Suddenly when pods stops due to whatever reason, I save the state of the task in some database (terminated but not completed) by catching SIGTERM signal or using terminationGracePeriod.
So assuming I have 10 terminated tasks, I want to restart those tasks when the pod restarts. If multiple pods restarts, they all will fetch the terminated tasks from the database, makes the status "In Progress" and all will start the task. So instead of the task starting once it will start multiple times as multiple pods had found it terminated. I dont want to apply locks on database as it will slow down my code. So how can I restrict only one pod to fetch the terminated tasks and start only once.

Just FYI, I am trying to achieve the restarting of tasks in Golang.

答案1

得分: 1

将任务的状态存储在数据库中,当Pod终止时,将状态更新为'terminated'。然后,当Pod重新启动时,查询已经'terminated'并且需要继续的任务。随机选择一个任务的ID,并执行UPDATE事务将状态更新为'running'(确保同时包含WHERE status = 'terminated')。在SQL中,单个UPDATE操作默认是原子的,这意味着在更新过程中没有其他事务可以修改该行。使用像GORM这样的ORM时,您将得到一个包含已修改的行数的结果。如果行数不等于1,则表示另一个Pod已经更新了此任务,因此我们应该选择另一个ID并重试,直到执行UPDATE操作更新的行数为1。

这只是一个想法,不能保证这对您有效,因为我不知道您的技术栈的全部范围(使用的是什么数据库、ORM等)。

英文:

Store the state of the task in a database, and when the pod terminates, you update the state to 'terminated'. Then when pods start up again, query for tasks that have been 'terminated' and need to be continued. Grab a random ID for one of these tasks, and perform an UPDATE transaction to update the status to 'running' (make sure to also include WHERE status = 'terminated'). Single UPDATE operations in SQL are by default atomic, meaning no other transactions can modify the row while it is being updated. When using an ORM like GORM you will get a result containing the number of rows that was modified. If the number of rows is not equal to 1, that means another pod already updated this task, so we should grab another ID and try again until we perform an UPDATE where the number of rows updated is 1.

This is just an idea, no guarantees that this will work for you, as I do not know the full extent of your tech stack (what DB, ORM etc).

huangapple
  • 本文由 发表于 2022年4月7日 17:35:58
  • 转载请务必保留本文链接:https://go.coder-hub.com/71779727.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定