I have successfully compiled MPAS-A and run the sample real-data test case available from the downloads page on a single core. I have 2 identical 16-core computers - connected by an infiniband network. After generating a 16-core mesh decomposition file using METIS, I am able to run on 16 cores on either local machine A or remote machine B using the following:
mpirun -n 16 -hostfile 16.mac ./atmosphere_model (where 16.mac is just the ip address of A or B and 16 cores: "192.168.2.2:16" ).
But when I create a 32-core mesh decomposition file, and update the 16.mac to a 32.mac with the following:
192.168.2.1:16
192.168.2.2:16
and launch with mpirun -n 32 -hostfile 32.mac ./atmosphere_model
I get some strange MPI error messages and an abort. This works just fine with WRF:
mpirun -n 32 -hostfile 32.mac ./wrf.exe
Does anybody have any idea what might be happening?
Thanks..
mpirun -n 16 -hostfile 16.mac ./atmosphere_model (where 16.mac is just the ip address of A or B and 16 cores: "192.168.2.2:16" ).
But when I create a 32-core mesh decomposition file, and update the 16.mac to a 32.mac with the following:
192.168.2.1:16
192.168.2.2:16
and launch with mpirun -n 32 -hostfile 32.mac ./atmosphere_model
I get some strange MPI error messages and an abort. This works just fine with WRF:
mpirun -n 32 -hostfile 32.mac ./wrf.exe
Does anybody have any idea what might be happening?
Thanks..