英文:
slurm ignores dependency on running job
问题
假设在使用Slurm的集群上,ID为12345的作业正在运行。我想要提交另一个作业,在这个作业完成后开始运行。我尝试了sbatch -d after:12345 job.script
,但我注意到scontrol show job 12346
显示Dependency=(null)
。因此,我尝试了scontrol update JobId=12346 dependency=after:12345
,但scontrol
仍然显示Dependency=(null)
。为什么这个依赖被忽略了?我能做些什么来使其按预期工作吗?如果依赖关系是一个不在运行中的作业,我就看不到这个问题。
英文:
Suppose that on a cluster with slurm the job with ID 12345 is currently running. I want to submit another job that will start after this job finishes. I tried sbatch -d after:12345 job.script
, but I noticed that scontrol show job 12346
displays Dependency=(null)
. I therefore tried scontrol update JobId=12346 dependency=after:12345
, but scontrol
still shows Dependency=(null)
. Why is this dependency ignored? Can I change anything to make this work as desired? I don't see this problem if the dependency is a job that is not running.
答案1
得分: 2
使用-d after:12345
,你正在设置一个依赖于作业12345
启动的依赖关系。由于该作业目前正在运行,实际上依赖关系在实践中是无效的。
你想要的要么是:
-d afterok:12345
,以依赖于作业12345
成功完成;-d afterany:12345
,以依赖于作业12345
的结束(成功、取消或失败)。
英文:
With -d after:12345
, you are setting a dependency on the start of job 12345
. As that job is currently running, the dependency is void in practice.
What you want is either
-d afterok:12345
to set a dependency on the successful completion of job12345
; or-d afterany:12345
to set a dependency on the end (successful, canceled, or failed) of job12345
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论