Slurm忽略了对正在运行的作业的依赖。

huangapple go评论65阅读模式
英文:

slurm ignores dependency on running job

问题

假设在使用Slurm的集群上,ID为12345的作业正在运行。我想要提交另一个作业,在这个作业完成后开始运行。我尝试了sbatch -d after:12345 job.script,但我注意到scontrol show job 12346显示Dependency=(null)。因此,我尝试了scontrol update JobId=12346 dependency=after:12345,但scontrol仍然显示Dependency=(null)。为什么这个依赖被忽略了?我能做些什么来使其按预期工作吗?如果依赖关系是一个不在运行中的作业,我就看不到这个问题。

英文:

Suppose that on a cluster with slurm the job with ID 12345 is currently running. I want to submit another job that will start after this job finishes. I tried sbatch -d after:12345 job.script, but I noticed that scontrol show job 12346 displays Dependency=(null). I therefore tried scontrol update JobId=12346 dependency=after:12345, but scontrol still shows Dependency=(null). Why is this dependency ignored? Can I change anything to make this work as desired? I don't see this problem if the dependency is a job that is not running.

答案1

得分: 2

使用-d after:12345,你正在设置一个依赖于作业12345启动的依赖关系。由于该作业目前正在运行,实际上依赖关系在实践中是无效的。

你想要的要么是:

  • -d afterok:12345,以依赖于作业12345成功完成;
  • -d afterany:12345,以依赖于作业12345的结束(成功、取消或失败)。
英文:

With -d after:12345, you are setting a dependency on the start of job 12345. As that job is currently running, the dependency is void in practice.

What you want is either

  • -d afterok:12345 to set a dependency on the successful completion of job 12345; or
  • -d afterany:12345 to set a dependency on the end (successful, canceled, or failed) of job 12345

huangapple
  • 本文由 发表于 2023年6月9日 06:09:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/76436008.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定