If createJob/createTask works for my function? What is the difference between create multiple jobs and create multiple tasks in one job?

huangapple go评论53阅读模式
英文:

If createJob/createTask works for my function? What is the difference between create multiple jobs and create multiple tasks in one job?

问题

I want to run multiple completely independent scripts, which only differs from each other by 1 or 2 parameters, in parallel, so I write the main part as a function and pass the parameters by createJob and createTask as follow:

% Run_DMRG_HubbardKondo
UList = [1, 2, 4, 8];
J_UList = [-1, 0:0.2:2];
c = parcluster;
c.NumThreads = 3;
j = createJob(c);
for iU = 1:numel(UList)
    for iJ_U = 1:numel(J_UList)
        t = createTask(j, @DMRG_HubbardKondo, 0, {{UList(iU), J_UList(iJ_U)}});
    end
end
submit(j);
wait(j,'finished')
delete(j);
clear j t
exit
function DMRG_HubbardKondo(U_Job, J_U_Job)
...% (skipped)
end

What if I createJob multiple times each with one createTask? I know there are some options like attachedfile in createJob. But with respect to independency, is there any difference between createJob and createTask? The reason I ask about independency is that there are setenv inside the DMRG_HubbardKondo function as follow:

function DMRG_HubbardKondo(U_Job, J_U_Job)
...% (skipped)
DirTmp = '/tmp/swan';
setenv('LMA', DirTmp)
Para.DateStr = datestr(datetime('now'), 30);
% RCDir named by parameter and datetime
Para.RCDir = [DirTmp, '/RCStore', Para.DateStr, sprintf('U%.4gJ%.4g', [U_Job, J_U_Job])];
k = [strfind(Para.Symm, 'SU2'), strfind(Para.Symm, '-v')];
if ~isempty(k)
    RC = Para.RCDir
    if exist(RC, 'dir') == 0
        mkdir(RC);    % creat if not exist
        fprintf([RC, ' made.\n'])
    end
    setenv('RC_STORE', RC);
    setenv('CG_VERBOSE', '0');
end
... % (skipped)
end

The main part DMRG_HubbardKondo will use some mex-compiled functions which act like wigner-eckart theorem. Specifically, it will generate and retrieve data (cg coefficients) in RCDir in every step. I guess those mex-compiled functions will find the corresponding RCDir by "getenv" and I want to know whether createJob/createTask will work correctly.

In summary, my questions are:

  1. difference between create multiple tasks in one job and create multiple jobs each with one task.
  2. will createJob/createTask work for my function?

I know sbatch will work by writing a script passing parameters to submit.sh as follow:

function GenSubmitsh(partition, nodeNo, TLim, NCore, mem, logName, JobName, ParaName, ScriptName)

if isnan(nodeNo)
    nodeStr = '##SBATCH --nodelist=auto \n';
else
    nodeStr = sprintf('#SBATCH --nodelist=node%g \n', nodeNo);
end

Submitsh = sprintf([
    '#!/bin/bash -l \n', ...
    '#SBATCH --partition=%s \n', ...
    nodeStr, ...
    '#SBATCH --exclude=node1051 \n', ...
    '#SBATCH --time=%s \n', ...
    '#SBATCH --nodes=1 \n', ...
    '#SBATCH --ntasks=1 \n', ...
    '#SBATCH --cpus-per-task=%g \n', ...
    '#SBATCH --mem=%s \n', ...
    '#SBATCH --output=%s \n', ...
    '#SBATCH --job-name=%s \n', ...
    '\n', ...
    '##Do not remove or change this line in GU_CLUSTER \n', ...
    '##export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK     \n', ...
    '\n', ...
    'echo "Job Started At" \n', ...
    'date \n', ...
    '\n', ...
    'matlab -nodesktop -nojvm -nodisplay -r "ParaName=''%s'',%s" \n', ...
    '\n', ...
    'echo "Job finished at" \n', ...
    'date \n'], ...
    partition, TLim, NCore, mem, logName, JobName, ParaName, ScriptName);

fileID = fopen('Submit.sh', 'w');
fprintf(fileID, '%s', Submitsh);
fclose(fileID);

end

I hope createJob/createTask will work equivalently. (i.e. completely independent).

英文:

I want to run multiple completely independent scripts, which only differs from each other by 1 or 2 parameters, in parallel, so I write the main part as a function and pass the parameters by createJob and createTask as follow:

% Run_DMRG_HubbardKondo
UList = [1, 2, 4, 8];
J_UList = [-1, 0:0.2:2];
c = parcluster;
c.NumThreads = 3;
j = createJob(c);
for iU = 1:numel(UList)
    for iJ_U = 1:numel(J_UList)
        t = createTask(j, @DMRG_HubbardKondo, 0, {{UList(iU), J_UList(iJ_U)}});
    end
end
submit(j);
wait(j,'finished')
delete(j);
clear j t
exit
function DMRG_HubbardKondo(U_Job, J_U_Job)
...% (skipped)
end

What if I createJob multiple times each with one createTask? I know there are some options like attachedfile in createJob. But with respect to independency, is there any difference between createJob and createTask? The reason I ask about independency is that there are setenv inside the DMRG_HubbardKondo function as follow:

function DMRG_HubbardKondo(U_Job, J_U_Job)
...% (skipped)
DirTmp = '/tmp/swan';
setenv('LMA', DirTmp)
Para.DateStr = datestr(datetime('now'),30);
% RCDir named by parameter and datetime
Para.RCDir = [DirTmp,'/RCStore',Para.DateStr,sprintf('U%.4gJ%.4g', [U_Job,J_U_Job])];
k = [strfind(Para.Symm,'SU2'), strfind(Para.Symm,'-v')];
if ~isempty(k)
    RC = Para.RCDir
    if exist(RC, 'dir')==0
        mkdir(RC);    % creat if not exist
        fprintf([RC,' made.\n'])
    end
    setenv('RC_STORE', RC);
    setenv('CG_VERBOSE', '0');
end
... % (skipped)
end

The main part DMRG_HubbardKondo will use some mex-compiled functions which act like wigner-eckart theorem. Specifically, it will generate and retrieve data(cg coefficients) in RCDir in every steps. I guess those mex-compiled functions will find the corresponding RCDir by "getenv" and I want to know whether createJob/createTask will work correctly.

In summary, my questions are:

  1. difference between create multiple tasks in one job and create multiple jobs each with one task.
  2. will createJob/createTask work for my function?

I know sbatch will work by writing a script passing parameters to submit.sh as follow:

function GenSubmitsh(partition,nodeNo,TLim,NCore,mem,logName,JobName,ParaName,ScriptName)

if isnan(nodeNo)
    nodeStr = '##SBATCH --nodelist=auto \n';
else
    nodeStr = sprintf('#SBATCH --nodelist=node%g \n',nodeNo);
end

Submitsh = sprintf([
    '#!/bin/bash -l \n',...
    '#SBATCH --partition=%s \n',...
    nodeStr,...
    '#SBATCH --exclude=node1051 \n',...
    '#SBATCH --time=%s \n',...
    '#SBATCH --nodes=1 \n',...
    '#SBATCH --ntasks=1 \n',...
    '#SBATCH --cpus-per-task=%g \n',...
    '#SBATCH --mem=%s \n',...
    '#SBATCH --output=%s \n',...
    '#SBATCH --job-name=%s \n',...
    '\n',...
    '##Do not remove or change this line in GU_CLUSTER \n',...
    '##export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK     \n',...
    '\n',...
    'echo "Job Started At" \n',...
    'date \n',...
    '\n',...
    'matlab -nodesktop -nojvm -nodisplay -r "ParaName=''%s'',%s" \n',...
    '\n',...
    'echo "Job finished at" \n',...
    'date \n'],...
    partition,TLim,NCore,mem,logName,JobName,ParaName,ScriptName);

fileID = fopen('Submit.sh','w');
fprintf(fileID,'%s',Submitsh);
fclose(fileID);

end

I hope createJob/createTask will work equivalently.(i.e. completely independent)

答案1

得分: 1

多次调用createJob,每次只创建一个createTask与一次调用createJob,创建多个createTask之间只有轻微的差异。我会说通常最好使用一个带有多个任务的单个作业,除非您有特定原因不这样做。以下是一些考虑因素:

  • 有一个单个作业对象可以使提交过程的某些阶段只需执行一次,而不是多次(例如,附加文件的某些部分等)。

  • 可以对createTask的调用进行矢量化(尽管可能会有点尴尬)。(这不影响执行)

  • MATLAB作业调度程序(MJS)系统上,您可以为每个作业对象设置更多属性,比如执行期间使用的工作程序范围。

  • 在使用类似于SLURM的调度程序时,可以将单个作业的多个任务提交给调度程序作为“作业数组”,我认为这对调度程序本身可能更有效。

  • 在使用不是MJS的调度程序时,不管作业中是否只有一个任务,每个任务都在一个全新的MATLAB工作程序中运行。

英文:

There are only minor differences between multiple createJob calls each with a single createTask vs. single createJob with multiple createTask calls. I would say it is generally better to use a single Job with multiple Tasks, unless you have a specific reason not to. Here are some considerations:

  • Having a single Job object allows some of the stages of the submission process to be done once instead of multiple times (e.g. some pieces of attaching files etc.)

  • It is possible (although admittedly awkward) to vectorise the calls to createTask. (This doesn't affect execution)

  • On the MATLAB Job Scheduler (MJS) system, you can set more properties per Job object, such as a range of workers to be used during execution

  • When using schedulers such as SLURM, multiple Tasks of a single Job can be submitted to the scheduler as a "job array", which I believe can be more efficient for the scheduler itself.

  • When using schedulers other than MJS, each Task runs in a fresh MATLAB worker process, regardless of whether it is the only Task in a Job or not.

huangapple
  • 本文由 发表于 2023年2月8日 13:51:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/75381818.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定