Unable to run OpenMPI across more than two machines

Posted by rcollyer on Stack Overflow See other posts from Stack Overflow or by rcollyer
Published on 2010-03-22T19:46:29Z Indexed on 2010/03/22 19:51 UTC
Read the original article Hit count: 344

Filed under:
|
|

When attempting to run the first example in the boost::mpi tutorial, I was unable to run across more than two machines. Specifically, this seemed to run fine:

mpirun -hostfile hostnames -np 4 boost1

with each hostname in hostnames as <node_name> slots=2 max_slots=2. But, when I increase the number of processes to 5, it just hangs. I have decreased the number of slots/max_slots to 1 with the same result when I exceed 2 machines. On the nodes, this shows up in the job list:

<user> Ss orted --daemonize -mca ess env -mca orte_ess_jobid 388497408 \
-mca orte_ess_vpid 2 -mca orte_ess_num_procs 3 -hnp-uri \
388497408.0;tcp://<node_ip>:48823

Additionally, when I kill it, I get this message:

node2- daemon did not report back when launched
node3- daemon did not report back when launched

The cluster is set up with the mpi and boost libs accessible on an NFS mounted drive. Am I running into a deadlock with NFS? Or, is something else going on?

© Stack Overflow or respective owner

Related posts about openmpi

Related posts about boost-mpi