如何正确使用Slurm的sbatch和Python的多进程功能。

huangapple go评论70阅读模式
英文:

How to use properly Slurm sbatch and python Multiprocessing

问题

I want to run a code using multiprocessing in a server with slurm architecture.
I want to limit the number of cpus available and that the code creates a child process for every one of them.

My code could be simplified in this way:

def Func(ins) : 
  ###
  things...
  ###
  return var

if __name__ == '__main__' :
  from multiprocessing import Pool
  from multiprocessing import active_children
  from multiprocessing import cpu_count

  p = Pool()
  print("active cpus =", cpu_count())
  print("open process =", p._processes)
  print("active_children =", len(active_children()))
  results = p.map(Func, range(2000))
  p.close()
  exit()

Ruled by this bash script:

#!/bin/bash

#SBATCH --time=1:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=48
#SBATCH --mem=40000 # Memory per node (in MB).

module load python 
conda activate myenv
python3 test.py

echo 'done!'

What I get is that the code runs every time on the maximum number of cpus (272), whatever combination of parameters I try:

active cpus = 272
open process = 272
active_children = 272
done!

I launch the job with the command:

sbatch job.sh

What I'm doing wrong?

英文:

I wan to run a code using multiprocessing in a server with slurm architecture.
I want to limit the number of cpus available and that the code creates a child process for every of them.

My code could be simplified in this way:

def Func(ins) : 
  ###
  things...
  ###
return var

if __name__ == '__main__' :
  from multiprocessing import Pool
  from multiprocessing import active_children
  from multiprocessing import cpu_count

  p = Pool()
  print("active cpus = ", cpu_count())
  print("open process = ", p._processes)
  print("active_children = ", len(active_children()))
  results = p.map(Func, range(2000))
  p.close()

  exit()

ruled by this bash script:

#!/bin/bash

#SBATCH --time=1:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=48
#SBATCH --mem=40000 # Memory per node (in MB).

module load python 
conda activate myenv
python3 test.py

echo 'done!'

What I get is that the code runs every time on the maximum number of cpus (272), whatever combination of parameters I try:

active cpus =  272
open process =  272
active_children =  272
done!

I launch the job with the command

sbatch job.sh

What I'm doing wrong?

答案1

得分: 0

你的Python代码负责基于Slurm分配创建所需数量的进程。

if __name__ == '__main__':
    from multiprocessing import Pool
    from multiprocessing import active_children
    from multiprocessing import cpu_count

    ncpus = int(os.environ['SLURM_CPUS_PER_TASK'])
    p = Pool(ncpus)

    print("active cpus =", cpu_count())
    print("open process =", p._processes)
    print("active_children =", len(active_children()))
    results = p.map(Func, range(2000))
    p.close()

    exit()

SLURM_CPUS_PER_TASK环境变量将保存在提交脚本的#SBATCH --cpus-per-task=48行中指定的值。

英文:

Your Python code is responsible for creating the wanted number of processes based on the Slurm allocation.

If you want, as is often the case, to have one process per allocated CPU, your code should look like this:

if __name__ == '__main__' :
  from multiprocessing import Pool
  from multiprocessing import active_children
  from multiprocessing import cpu_count

  ncpus = int(os.environ['SLURM_CPUS_PER_TASK'])
  p = Pool(ncpus)

  print("active cpus = ", cpu_count())
  print("open process = ", p._processes)
  print("active_children = ", len(active_children()))
  results = p.map(Func, range(2000))
  p.close()

  exit()

The SLURM_CPUS_PER_TASK environment variable will hold the value you specify in the #SBATCH --cpus-per-task=48 line in the submission script.

huangapple
  • 本文由 发表于 2023年5月23日 01:01:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/76308447.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定