关于salloc命令的问题:它在哪里执行?

huangapple go评论72阅读模式
英文:

Question about the salloc command: Where does it execute?

问题

I have a question about the salloc command in a cluster environment. When I execute the salloc command on the login node using salloc -n 1 --gpus=1 hostname, it still displays the hostname of the login node instead of the compute node's hostname. I expected to get the hostname of the compute node instead. Similarly, when I execute salloc -n 1 --gpus=1, it executes the /bin/bash on the login node with resources allocated.

My question is, if the command is not a shell like /bin/bash, does the salloc command have any effect? Will it only allocate resources and execute the command on the login node, without utilizing the compute nodes? It seems like salloc only utilizes the compute nodes when executing shell commands.

I would appreciate any clarification on this matter. Thank you.

英文:

I have a question about the salloc command in a cluster environment. When I execute the salloc command on the login node using salloc -n 1 --gpus=1 hostname, it still displays the hostname of the login node instead of the compute node's hostname. I expected to get the hostname of the compute node instead. Similarly, when I execute salloc -n 1 --gpus=1, it executes the /bin/bash on the login node with resources allocated.

My question is, if the command is not a shell like /bin/bash, does the salloc command have any effect? Will it only allocate resources and execute the command on the login node, without utilizing the compute nodes? It seems like salloc only utilizes the compute nodes when executing shell commands.

I would appreciate any clarification on this matter. Thank you.

答案1

得分: 1

使用默认配置,salloc 只会创建一个分配,即请求资源并阻塞直到资源可用,并在登录节点上启动一个 shell,而不是在分配的节点上。然后,在该 shell 中,您可以使用 srunmpirun 启动并行程序,进程将在分配的节点上运行。或者您可以运行:

srun --pty /bin/bash -l

然后,您将在分配的节点上运行一个 shell。

或者,这已经是官方建议的方法已经有一段时间了,您可以直接使用 srun 命令(即不在 salloc 会话中使用),如下所示:

srun -n 1 --gpus=1 --pty /bin/bash -l

以获得相同的结果。

这已经让用户困惑了很长时间,特别是因为 Slurm 曾经建议在 slurm.conf 中定义 SallocDefaultCommand="srun -n1 -N1 --mem-per-cpu=0 --pty --preserve-env --cpu-bind=no --mpi=none $SHELL",这会在用户运行 salloc 命令时自动启动一个 srun 会话。

在较新的版本中,Slurm 有一个选项 LaunchParameters=use_interactive_step,意味着它将成为默认选项,并且将使 salloc 成为用于在分配的第一个节点上获取 shell 的命令,同时正确处理 cgroups 和 tasksets。

英文:

With the default configuration, the salloc will only create an allocation, that is request resources and block until the resources are available, and start a shell on the login node, not on the allocated node. Then, in that shell, you can start a parallel program with srun or mpirun and the processes will run on the allocated nodes. Or you can run

srun --pty /bin/bash -l

and you will have a shell running on the allocated node.

Alternatively, and this has been the official recommended way for some time, you can use the srun command directly (i.e. not in a salloc session) like this:

srun -n 1 --gpus=1 --pty /bin/bash -l

for the same result.

This has confused users for a long time, especially since Slurm used to have a recommendation to define SallocDefaultCommand="srun -n1 -N1 --mem-per-cpu=0 --pty --preserve-env --cpu-bind=no --mpi=none $SHELL" in the slurm.conf which had the effect of starting an srun session automatically when the user ran the salloc command.

In the newer versions, Slurm has an option LaunchParameters=use_interactive_step that is meant to become the default and will make salloc the command to use to get a shell on the first node of the allocation, while at the same time properly handling cgroups and tasksets.

huangapple
  • 本文由 发表于 2023年6月29日 09:42:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/76577593.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定