MPIRUN尽管有主机文件和SSH访问权限,但未在工作节点上执行。

huangapple go评论62阅读模式
英文:

MPIRUN is not executing on Worker node despite hostfile and SSH access

问题

我正在我的主节点上执行helloworld.py的简单演示代码,只有一个在machinefile中引入的工作节点(VM)。我在工作节点上安装了mpirun,并将脚本放置在那里(不确定放在哪里,/home/user/mpirun-master/demo)。

MPI在执行之前确实会检查对工作节点的ssh访问权限,但它只在我的主节点上运行,没有来自工作节点的任何进程结果。

这是我的machinefile的内容:

dell@172.16.197.1 # 主节点
kypo-1@172.16.197.129 # 工作节点

以下是我得到的输出:

mpirun -np 2 --machinefile machinefile python3 helloworld.py
无效的MIT-MAGIC-COOKIE-1密钥你好,世界!我是dell-MS-7A70上的2个进程中的第1个。
你好,世界!我是dell-MS-7A70上的2个进程中的第0个。

两者都在dell-MS-7A70(主机器设备名称)上运行,我该如何使进程在工作节点上运行呢?这个问题是否由于工作机器是虚拟的而引起的?

英文:

I am executing simple demo code of helloworld.py on my main node with only one worker (VM) introduced in machinefile. I have installed mpirun on worker as well and also placed the script there (not sure where exactly to place it, /home/user/mpirun-master/demo).

MPI do check for ssh access to worker node before executing but it is only running on my main node and no process outcome come from the worker.

This is content of my machinefile

dell@172.16.197.1 # main node
kypo-1@172.16.197.129 # worker

And this is the output I am getting

mpirun -np 2 --machinefile machinefile python3 helloworld.py
Invalid MIT-MAGIC-COOKIE-1 keyHello, World! I am process 1 of 2 on dell-MS-7A70.
Hello, World! I am process 0 of 2 on dell-MS-7A70

Both are running on dell-MS-7A70 (main-machine device name), how can I make process to run on worker node. Is this problem arising due to worker machine being a virtual one?

答案1

得分: 0

问题在我在工作节点上创建了一个与我的账户同名的账户,并在主节点和节点的机器文件中修复了插槽号后得到了解决,因为我的脚本一直偏向于主节点。

现在我的机器文件看起来是这样的:

172.16.197.129 max_slots=3 # 工作节点
172.16.197.1 max_slots=1 # 主节点
英文:

The issue was resolved when I created account with same name on my worker node and fixed slot numbers in machinefile for master and nodes as my script was preferring master eachtime.

Now my machinefile looks like:

172.16.197.129 max_slots=3 # worker 
172.16.197.1 max_slots=1 # master

huangapple
  • 本文由 发表于 2023年6月22日 15:19:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/76529425.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定