英文:
How to use properly Slurm sbatch and python Multiprocessing
问题
I want to run a code using multiprocessing in a server with slurm architecture.
I want to limit the number of cpus available and that the code creates a child process for every one of them.
My code could be simplified in this way:
def Func(ins) :
###
things...
###
return var
if __name__ == '__main__' :
from multiprocessing import Pool
from multiprocessing import active_children
from multiprocessing import cpu_count
p = Pool()
print("active cpus =", cpu_count())
print("open process =", p._processes)
print("active_children =", len(active_children()))
results = p.map(Func, range(2000))
p.close()
exit()
Ruled by this bash script:
#!/bin/bash
#SBATCH --time=1:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=48
#SBATCH --mem=40000 # Memory per node (in MB).
module load python
conda activate myenv
python3 test.py
echo 'done!'
What I get is that the code runs every time on the maximum number of cpus (272), whatever combination of parameters I try:
active cpus = 272
open process = 272
active_children = 272
done!
I launch the job with the command:
sbatch job.sh
What I'm doing wrong?
英文:
I wan to run a code using multiprocessing in a server with slurm architecture.
I want to limit the number of cpus available and that the code creates a child process for every of them.
My code could be simplified in this way:
def Func(ins) :
###
things...
###
return var
if __name__ == '__main__' :
from multiprocessing import Pool
from multiprocessing import active_children
from multiprocessing import cpu_count
p = Pool()
print("active cpus = ", cpu_count())
print("open process = ", p._processes)
print("active_children = ", len(active_children()))
results = p.map(Func, range(2000))
p.close()
exit()
ruled by this bash script:
#!/bin/bash
#SBATCH --time=1:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=48
#SBATCH --mem=40000 # Memory per node (in MB).
module load python
conda activate myenv
python3 test.py
echo 'done!'
What I get is that the code runs every time on the maximum number of cpus (272), whatever combination of parameters I try:
active cpus = 272
open process = 272
active_children = 272
done!
I launch the job with the command
sbatch job.sh
What I'm doing wrong?
答案1
得分: 0
你的Python代码负责基于Slurm分配创建所需数量的进程。
if __name__ == '__main__':
from multiprocessing import Pool
from multiprocessing import active_children
from multiprocessing import cpu_count
ncpus = int(os.environ['SLURM_CPUS_PER_TASK'])
p = Pool(ncpus)
print("active cpus =", cpu_count())
print("open process =", p._processes)
print("active_children =", len(active_children()))
results = p.map(Func, range(2000))
p.close()
exit()
SLURM_CPUS_PER_TASK
环境变量将保存在提交脚本的#SBATCH --cpus-per-task=48
行中指定的值。
英文:
Your Python code is responsible for creating the wanted number of processes based on the Slurm allocation.
If you want, as is often the case, to have one process per allocated CPU, your code should look like this:
if __name__ == '__main__' :
from multiprocessing import Pool
from multiprocessing import active_children
from multiprocessing import cpu_count
ncpus = int(os.environ['SLURM_CPUS_PER_TASK'])
p = Pool(ncpus)
print("active cpus = ", cpu_count())
print("open process = ", p._processes)
print("active_children = ", len(active_children()))
results = p.map(Func, range(2000))
p.close()
exit()
The SLURM_CPUS_PER_TASK
environment variable will hold the value you specify in the #SBATCH --cpus-per-task=48
line in the submission script.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论