Unable to run OpenMPI across more than two machines
Posted
by rcollyer
on Stack Overflow
See other posts from Stack Overflow
or by rcollyer
Published on 2010-03-22T19:46:29Z
Indexed on
2010/03/22
19:51 UTC
Read the original article
Hit count: 344
When attempting to run the first example in the boost::mpi tutorial, I was unable to run across more than two machines. Specifically, this seemed to run fine:
mpirun -hostfile hostnames -np 4 boost1
with each hostname in hostnames as <node_name> slots=2 max_slots=2
. But, when I increase the number of processes to 5, it just hangs. I have decreased the number of slots
/max_slots
to 1 with the same result when I exceed 2 machines. On the nodes, this shows up in the job list:
<user> Ss orted --daemonize -mca ess env -mca orte_ess_jobid 388497408 \
-mca orte_ess_vpid 2 -mca orte_ess_num_procs 3 -hnp-uri \
388497408.0;tcp://<node_ip>:48823
Additionally, when I kill it, I get this message:
node2- daemon did not report back when launched
node3- daemon did not report back when launched
The cluster is set up with the mpi
and boost
libs accessible on an NFS mounted drive. Am I running into a deadlock with NFS? Or, is something else going on?
© Stack Overflow or respective owner