Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

MPAS MPI error running on a 2 computer pool

cyclone6

New member
I have successfully compiled MPAS-A and run the sample real-data test case available from the downloads page on a single core. I have 2 identical 16-core computers - connected by an infiniband network. After generating a 16-core mesh decomposition file using METIS, I am able to run on 16 cores on either local machine A or remote machine B using the following:
mpirun -n 16 -hostfile 16.mac ./atmosphere_model (where 16.mac is just the ip address of A or B and 16 cores: "192.168.2.2:16" ).

But when I create a 32-core mesh decomposition file, and update the 16.mac to a 32.mac with the following:
192.168.2.1:16
192.168.2.2:16

and launch with mpirun -n 32 -hostfile 32.mac ./atmosphere_model

I get some strange MPI error messages and an abort. This works just fine with WRF:
mpirun -n 32 -hostfile 32.mac ./wrf.exe

Does anybody have any idea what might be happening?

Thanks..
 
Sure - here you go: (thanks for responding so quickly!)

"[wrf1.aac.local:mpi_rank_0][rdma_param_handle_heterogeneity] All nodes involved in the job were detected to be homogeneous in terms of processors and interconnects. Setting MV2_HOMOGENEOUS_CLUSTER=1 can improve job startup performance on such systems. The following link has more details on enhancing job startup performance. MVAPICH :: Job Startup Performance.
[wrf1.aac.local:mpi_rank_0][rdma_param_handle_heterogeneity] To suppress this warning, please set MV2_SUPPRESS_JOB_STARTUP_PERFORMANCE_WARNING to 1

[cli_17]: aborting job:
Fatal error in PMPI_Alltoallw:
Other MPI error, error stack:
PMPI_Alltoallw(548)................: MPI_Alltoallw(sbuf=0xeca4b70, scnts=0x7ffe89b82c20, sdispls=0x7ffe89b82b20, stypes=0x7ffe89b82a20, rbuf=0xee4e8a0, rcnts=0x7ffe89b82ba0, rdispls=0x7ffe89b82aa0, rtypes=0x7ffe89b829a0, comm=0xc4000004) failed
MPIR_Alltoallw_impl(366)...........:
MPIR_Alltoallw(335)................:
MPIR_Alltoallw_intra(171)..........:
MPIR_Waitall_impl(248).............:
MPIDI_CH3I_Progress(285)...........:
handle_read(1359)..................:
handle_read_individual(1417).......:
MPIDI_CH3I_MRAIL_Parse_header(1506): Control shouldn't reach here in prototype, header %d
(errno 154)"



I am using mvapich2-2.3.6, and as I mentioned it works fine with WRF. And it works fine with MPAS until I try to get both machines working at the same time. Something with the decomposition file and assignment of MPI tasks?
 
It occurs to me that this may related to MPI rather than MPAS itself.
However, if you don't mind, you might want to try the following:

1. you may try to run MPAS with 32 cores in only one node (you may need to add --oversubscribe after mpirun if MPI requires ). I understand that there may not be enough cores on a single node, but it does not matter because we are only testing to ensure that there is no issue with the decomposition file. Alternately, you might run MPAS with 16 cores across 2 nodes (8 cores per node) to roll out decomposition issues.

2. Does your system contain more than one mpi? (like intelmpi , openmpi or so). Possibly include an absolute path before mpirun can solve the issue (for example, /public/home/123/mpi/mpirun -npernode 16 -n 32 -host 1,2./atmosphere model).
 
Still no luck. I tried with 8 cores, i.e., mpirun -n 8 -hostfile 8.mac ./atmosphere_model (4 cores each node). Same error.

mpirun -n 8 -host 192.168.2.1 ./atmosphere_model runs fine, as does
mpirun -n 8 -host 192.168.2.2 ./atmosphere_model

I did notice something though...when I start the run on the local machine, it creates the log.atmosphere.0000.out file immediately and starts quickly. When I start on the remote machine, ,there is a long delay before the log.atmosphere.0000.out is created and the run starts. Perhaps some kind of timeout issue? This is on a mounted NFS drive..

Why does it work with WRF but not MPAS?

Perhaps I'll try openmpi. mvapich2 is the only mpi implementation on my computers at the moment.
 
NFS ? you mean storage are mounted across compute nodes based on network file system ?
Did you see any *.lock files after execute mpirun command ?
I recall PNETCDF or PIO having compatibility issues with NFS.
Perhaps this is the reason why you cannot run model on multiple nodes.

have you set io_type="pnetcdf,cdf5" to see if model can run on two nodes?

it looks like :

XML:
<stream name="output"
        type="output"
        clobber_mode="overwrite"
        io_type="pnetcdf,cdf5"
        filename_template="history.$Y-$M-$D_$h.$m.$s.nc"
        output_interval="6:00:00" >
    <file name="stream_list.atmosphere.output"/>
</stream>

<stream name="diagnostics"
        type="output"
        clobber_mode="overwrite"
        io_type="pnetcdf,cdf5"
        filename_template="diag.$Y-$M-$D_$h.$m.$s.nc"
        output_interval="1:00:00" >
    <file name="stream_list.atmosphere.diagnostics"/>
</stream>


in streams.atmosphere file.
 
While I was attempting to modify the io_type parameters, I stumbled upon the fact that if I run MPAS-A on the 2 machines - with only 2 threads - the run does not die with the error message listed above. (The run worked before making the changes outlined in the last message.)

mpirun -n 2 -hostfile 2.mac ./atmosphere_model works just fine. (with verification from "top" that it is indeed running on both machines
mpirun -n 4 -hostfile 4.mac ./atmosphere_model does not work. It errors out as above.

A positive step. But I'm still at a loss. WRF runs just fine on all 32 cores across the 2 machines. I haven't had time to test out a different MPI implemetation yet..
 
Just a quick update: I finally got MPAS-A running on the 2 machines. I could not get MVAPICH2 to work, nor Intel MPI. But the MPICH2 3.3.1 file provided by Michael Duda for his script worked just fine. Working as expected...
 
Top