使用所有核心在2个高性能计算节点中。

huangapple go评论85阅读模式
英文:

Using all cores for 2 nodes in a HPC

问题

我正在尝试在HPC环境中运行一个R代码。HPC系统有8个节点,每个节点有20个核心。我希望使用2个节点,总共利用40个核心。我通过SLURM提交作业,并运行其中包含并行计算代码的.R文件。我有以下的.sbatch代码:

#!/bin/bash
#SBATCH --job-name=my_r_script   # 作业名称
#SBATCH --nodes=2                 # 节点数
##SBATCH --ntasks-per-node=20       # 每个节点的任务数
#SBATCH --ntasks=40                # 总任务数(核心数)
#SBATCH --cpus-per-task=1         # 每个任务的CPU核心数
#SBATCH --mem=4G                  # 每个节点的内存(例如,4G,8G,16G)
#SBATCH --time=1:00:00            # 墙上时间限制(hh:mm:ss)

module load R    # 加载R模块 

Rscript my_Rscript.R

然而,当我查看结果时,我知道它只使用了单个节点的20个核心,而不是总共的40个核心。我应该如何编写.sbatch文件,以确保利用来自2个节点的所有40个核心来运行R代码进行并行计算。

我已经使用了这个回答中提到的思路:https://stackoverflow.com/a/73828155/12493753 来理解*--ntasks--cpus-per-task=1*。

英文:

I am trying to run a R code in a HPC environment. The HPC system has 8 nodes with 20 cores each. I wish to use 2 nodes, utilizing 40 cores in total. I am submitting the job through SLURM and it is running the .R file which has a parallel computing code in it. I have the following .sbatch code

#!/bin/bash
#SBATCH --job-name=my_r_script   # Job name
#SBATCH --nodes=2                 # Number of nodes
##SBATCH --ntasks-per-node=20       # Number of tasks per node
#SBATCH --ntasks=40                # Total number of tasks (cores)
#SBATCH --cpus-per-task=1         # Number of CPU cores per task
#SBATCH --mem=4G                  # Memory per node (e.g., 4G, 8G, 16G)
#SBATCH --time=1:00:00            # Wall clock time limit (hh:mm:ss)

module load R    # Load the R module 

Rscript my_Rscript.R

However when I see the results, I know that it is only using 20 cores from a single node and not 40 cores together. How do I write the .sbatch file to ensure that all 40 cores from 2 nodes are utilized to run the R code for parallel computing.

I have used the idea presented in the response here: https://stackoverflow.com/a/73828155/12493753 to understand --ntasks and --cpus-per-task=1.

答案1

得分: 1

你的Slurm提交仅在一个节点上运行一个副本的R脚本,尽管使用Slurm分配了两个节点,除非在你的R代码中使用MPI。我的首选方法是使用pbdMPI包来管理我运行的R会话数量(以及会话之间的协作),然后使用parallel包的mclapply来管理每个会话内的多核共享内存计算。例如,使用4个R会话,每个会话使用10个核心,你的Slurm提交将如下所示:

#!/bin/bash
#SBATCH --job-name=my_r_script   # 作业名称
#SBATCH --nodes=2                 # 节点数量
#SBATCH --exclusive               # 使用分配的所有核心
#SBATCH --mem=4G                  # 每个节点的内存(例如,4G、8G、16G)
#SBATCH --time=1:00:00            # 墙钟时间限制(hh:mm:ss)

module load openmpi # 加载OpenMPI - 可能因地点而异
module load R    # 加载R模块 

## 每个节点运行2个R会话(map-by是OpenMPI特定的):
mpirun --map-by ppr:2:node Rscript my_Rscript.R 

使用10个核心的方式是通过parallel::mclapply(<parameters>, mc.cores = 10)来完成的。你也可以使用--map-by ppr:20:node来使用所有40个核心,这样你就在运行40个R会话。后者会使用更多内存。

还有其他通过Slurm和OpenMPI来指定相同的方式,不幸的是,在Slurm中有依赖于地点的默认设置,在R的部署中也有依赖于地点的设置,以及不同版本的MPI。

英文:

Your Slurm submission is running only one copy of your R script on one node despite allocating the two nodes with Slurm, unless you engage MPI in your R code. My preferred way is to use the package pbdMPI to manage how many R sessions I run (and cooperation between sessions) and then use parallel package's mclapply to manage multicore shared-memory computing within each session. For example, with 4 R sessions, each using 10 cores, your Slurm submission would look something like this:

#!/bin/bash
#SBATCH --job-name=my_r_script   # Job name
#SBATCH --nodes=2                 # Number of nodes
#SBATCH --exclusive               # Use all cores on allocated nodes
#SBATCH --mem=4G                  # Memory per node (e.g., 4G, 8G, 16G)
#SBATCH --time=1:00:00            # Wall clock time limit (hh:mm:ss)

module load openmpi # To load OpenMPI - may be site-dependent
module load R    # Load the R module 

## Run 2 R sessions per node (map-by is OpenMPI-specific):
mpirun --map-by ppr:2:node Rscript my_Rscript.R 

The use of the 10 cores would be done with parallel::mclapply(&lt;parameters&gt;, mc.cores = 10). You could also use all 40 cores with --map-by ppr:20:node, in which case you are running 40 R sessions. The latter would use more memory.

There are other ways to specify the same thing via either Slurm and OpenMPI. Unfortunately there are site-dependent defaults in Slurm, site-dependent deployments of R, and different flavors of MPI.

huangapple
  • 本文由 发表于 2023年7月7日 00:41:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/76630933.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定