英文:
MPIRUN is not executing on Worker node despite hostfile and SSH access
问题
我正在我的主节点上执行helloworld.py
的简单演示代码,只有一个在machinefile中引入的工作节点(VM)。我在工作节点上安装了mpirun,并将脚本放置在那里(不确定放在哪里,/home/user/mpirun-master/demo)。
MPI在执行之前确实会检查对工作节点的ssh访问权限,但它只在我的主节点上运行,没有来自工作节点的任何进程结果。
这是我的machinefile的内容:
dell@172.16.197.1 # 主节点
kypo-1@172.16.197.129 # 工作节点
以下是我得到的输出:
mpirun -np 2 --machinefile machinefile python3 helloworld.py
无效的MIT-MAGIC-COOKIE-1密钥你好,世界!我是dell-MS-7A70上的2个进程中的第1个。
你好,世界!我是dell-MS-7A70上的2个进程中的第0个。
两者都在dell-MS-7A70(主机器设备名称)上运行,我该如何使进程在工作节点上运行呢?这个问题是否由于工作机器是虚拟的而引起的?
英文:
I am executing simple demo code of helloworld.py
on my main node with only one worker (VM) introduced in machinefile. I have installed mpirun on worker as well and also placed the script there (not sure where exactly to place it, /home/user/mpirun-master/demo).
MPI do check for ssh access to worker node before executing but it is only running on my main node and no process outcome come from the worker.
This is content of my machinefile
dell@172.16.197.1 # main node
kypo-1@172.16.197.129 # worker
And this is the output I am getting
mpirun -np 2 --machinefile machinefile python3 helloworld.py
Invalid MIT-MAGIC-COOKIE-1 keyHello, World! I am process 1 of 2 on dell-MS-7A70.
Hello, World! I am process 0 of 2 on dell-MS-7A70
Both are running on dell-MS-7A70 (main-machine device name), how can I make process to run on worker node. Is this problem arising due to worker machine being a virtual one?
答案1
得分: 0
问题在我在工作节点上创建了一个与我的账户同名的账户,并在主节点和节点的机器文件中修复了插槽号后得到了解决,因为我的脚本一直偏向于主节点。
现在我的机器文件看起来是这样的:
172.16.197.129 max_slots=3 # 工作节点
172.16.197.1 max_slots=1 # 主节点
英文:
The issue was resolved when I created account with same name on my worker node and fixed slot numbers in machinefile for master and nodes as my script was preferring master eachtime.
Now my machinefile looks like:
172.16.197.129 max_slots=3 # worker
172.16.197.1 max_slots=1 # master
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论