英文:
Using all cores for 2 nodes in a HPC
问题
我正在尝试在HPC环境中运行一个R代码。HPC系统有8个节点,每个节点有20个核心。我希望使用2个节点,总共利用40个核心。我通过SLURM提交作业,并运行其中包含并行计算代码的.R文件。我有以下的.sbatch代码:
#!/bin/bash
#SBATCH --job-name=my_r_script # 作业名称
#SBATCH --nodes=2 # 节点数
##SBATCH --ntasks-per-node=20 # 每个节点的任务数
#SBATCH --ntasks=40 # 总任务数(核心数)
#SBATCH --cpus-per-task=1 # 每个任务的CPU核心数
#SBATCH --mem=4G # 每个节点的内存(例如,4G,8G,16G)
#SBATCH --time=1:00:00 # 墙上时间限制(hh:mm:ss)
module load R # 加载R模块
Rscript my_Rscript.R
然而,当我查看结果时,我知道它只使用了单个节点的20个核心,而不是总共的40个核心。我应该如何编写.sbatch文件,以确保利用来自2个节点的所有40个核心来运行R代码进行并行计算。
我已经使用了这个回答中提到的思路:https://stackoverflow.com/a/73828155/12493753 来理解*--ntasks和--cpus-per-task=1*。
英文:
I am trying to run a R code in a HPC environment. The HPC system has 8 nodes with 20 cores each. I wish to use 2 nodes, utilizing 40 cores in total. I am submitting the job through SLURM and it is running the .R file which has a parallel computing code in it. I have the following .sbatch code
#!/bin/bash
#SBATCH --job-name=my_r_script # Job name
#SBATCH --nodes=2 # Number of nodes
##SBATCH --ntasks-per-node=20 # Number of tasks per node
#SBATCH --ntasks=40 # Total number of tasks (cores)
#SBATCH --cpus-per-task=1 # Number of CPU cores per task
#SBATCH --mem=4G # Memory per node (e.g., 4G, 8G, 16G)
#SBATCH --time=1:00:00 # Wall clock time limit (hh:mm:ss)
module load R # Load the R module
Rscript my_Rscript.R
However when I see the results, I know that it is only using 20 cores from a single node and not 40 cores together. How do I write the .sbatch file to ensure that all 40 cores from 2 nodes are utilized to run the R code for parallel computing.
I have used the idea presented in the response here: https://stackoverflow.com/a/73828155/12493753 to understand --ntasks and --cpus-per-task=1.
答案1
得分: 1
你的Slurm提交仅在一个节点上运行一个副本的R脚本,尽管使用Slurm分配了两个节点,除非在你的R代码中使用MPI。我的首选方法是使用pbdMPI
包来管理我运行的R会话数量(以及会话之间的协作),然后使用parallel
包的mclapply
来管理每个会话内的多核共享内存计算。例如,使用4个R会话,每个会话使用10个核心,你的Slurm提交将如下所示:
#!/bin/bash
#SBATCH --job-name=my_r_script # 作业名称
#SBATCH --nodes=2 # 节点数量
#SBATCH --exclusive # 使用分配的所有核心
#SBATCH --mem=4G # 每个节点的内存(例如,4G、8G、16G)
#SBATCH --time=1:00:00 # 墙钟时间限制(hh:mm:ss)
module load openmpi # 加载OpenMPI - 可能因地点而异
module load R # 加载R模块
## 每个节点运行2个R会话(map-by是OpenMPI特定的):
mpirun --map-by ppr:2:node Rscript my_Rscript.R
使用10个核心的方式是通过parallel::mclapply(<parameters>, mc.cores = 10)
来完成的。你也可以使用--map-by ppr:20:node
来使用所有40个核心,这样你就在运行40个R会话。后者会使用更多内存。
还有其他通过Slurm和OpenMPI来指定相同的方式,不幸的是,在Slurm中有依赖于地点的默认设置,在R的部署中也有依赖于地点的设置,以及不同版本的MPI。
英文:
Your Slurm submission is running only one copy of your R script on one node despite allocating the two nodes with Slurm, unless you engage MPI in your R code. My preferred way is to use the package pbdMPI
to manage how many R sessions I run (and cooperation between sessions) and then use parallel
package's mclapply
to manage multicore shared-memory computing within each session. For example, with 4 R sessions, each using 10 cores, your Slurm submission would look something like this:
#!/bin/bash
#SBATCH --job-name=my_r_script # Job name
#SBATCH --nodes=2 # Number of nodes
#SBATCH --exclusive # Use all cores on allocated nodes
#SBATCH --mem=4G # Memory per node (e.g., 4G, 8G, 16G)
#SBATCH --time=1:00:00 # Wall clock time limit (hh:mm:ss)
module load openmpi # To load OpenMPI - may be site-dependent
module load R # Load the R module
## Run 2 R sessions per node (map-by is OpenMPI-specific):
mpirun --map-by ppr:2:node Rscript my_Rscript.R
The use of the 10 cores would be done with parallel::mclapply(<parameters>, mc.cores = 10)
. You could also use all 40 cores with --map-by ppr:20:node
, in which case you are running 40 R sessions. The latter would use more memory.
There are other ways to specify the same thing via either Slurm and OpenMPI. Unfortunately there are site-dependent defaults in Slurm, site-dependent deployments of R, and different flavors of MPI.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论