SLURM – 强制 MPI 在不同的物理 CPU 上调度不同的进程(rank)

huangapple go评论50阅读模式
英文:

SLURM - forcing MPI to schedule different ranks on different physical CPUs

问题

I am running an experiment on an 8 node cluster under SLURM. Each CPU has 8 physical cores, and is capable of hyperthreading. When running a program with

#SBATCH --nodes=8
#SBATCH --ntasks-per-node=8

mpirun -n 64 bin/hello_world_mpi

it schedules two ranks on the same physical core. Adding the option

#SBATCH --ntasks-per-core=1

gives an error, SLURM saying "Batch job submission failed: Requested node configuration is not available". Is it somehow only allocating 4 physical cores per node? How can I fix this?

英文:

I am running an experiment on an 8 node cluster under SLURM. Each CPU has 8 physical cores, and is capable of hyperthreading. When running a program with

#SBATCH --nodes=8
#SBATCH --ntasks-per-node=8

mpirun -n 64 bin/hello_world_mpi

it schedules two ranks on the same physical core. Adding the option

#SBATCH --ntasks-per-core=1

gives an error, SLURM saying "Batch job submission failed: Requested node configuration is not available". Is it somehow only allocating 4 physical cores per node? How can I fix this?

答案1

得分: 1

你可以使用 sinfo -o%C 命令来查看集群中可用的 CPU 信息。

在文档中,我没有找到关于 SBATCH 的 --ntasks-per-cpu 选项。你可以尝试以下选项来代替 SBATCH:--ntasks-per-core。根据文档:

--ntasks-per-core=
请求每个核心上调用的最大 ntasks。应与 --ntasks 选项一起使用。与节点级别的 --ntasks-per-node 不同,此选项在核心级别而不是节点级别起作用。此选项将被 srun 继承。

你还可以尝试 --cpus-per-task 选项。

c, --cpus-per-task=
告知 Slurm 控制器,后续作业步骤将每个任务需要 ncpus 个处理器。如果没有使用此选项,控制器将尝试为每个任务分配一个处理器。

还请注意:

从 22.05 开始,srun 将不会继承由 salloc 或 sbatch 请求的 --cpus-per-task 值。如果需要为任务(s) 设置此值,必须在调用 srun 时重新请求或使用 SRUN_CPUS_PER_TASK 环境变量进行设置。

英文:

You can check the available CPU information in your cluster using sinfo -o%C.

I wasn't able to find any --ntasks-per-cpu for SBATCH in the documentation. You could try the following options for SBATCH --ntasks-per-core. As per documentation:

> --ntasks-per-core=<ntasks>
> Request the maximum ntasks be invoked on each core. Meant to be used with the --ntasks option. Related to --ntasks-per-node except at
> the core level instead of the node level. This option will be
> inherited by srun.

You could also try --cpus-per-task.
> c, --cpus-per-task=<ncpus>
> Advise the Slurm controller that ensuing job steps will require ncpus number of processors per task. Without this option, the
> controller will just try to allocate one processor per task.

Also please note:

> Beginning with 22.05, srun will not inherit the --cpus-per-task
> value requested by salloc or sbatch. It must be requested again with
> the call to srun or set with the SRUN_CPUS_PER_TASK environment
> variable if desired for the task(s).

huangapple
  • 本文由 发表于 2023年4月13日 20:31:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/76005481.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定