MPI Error while running wrf.exe

Vaishnavi_198

New member
i) I ran wrf.exe and got error (file attached) which i do not understand. I run WRF on an HPC and this seems to be an MPI issue. I have attached the slurm script too.

ii) When I run wrf.exe I thought the error will be in the last rsl error file, but this error was found when I randomly checked for error in the rsl files and encountered rsl.error.0250 with the error, but the last error file was rsl.error.0383. How am I supposed to know which file has the error printed?

The error :

DYNAMICS OPTION: Eulerian Mass Coordinate
alloc_space_field: domain 1 , 43597500 bytes allocated
med_initialdata_input: calling input_input
Abort(608265743) on node 250 (rank 250 in comm 0): Fatal error in PMPI_Gatherv: Other MPI error, error stack:
PMPI_Gatherv(398)..........................: MPI_Gatherv failed(sbuf=0x467b820, scount=24, MPI_CHAR, rbuf=0x7ffdc1b34880, rcnts=0xf86ee70, displs=0xf86f480, datatype=MPI_CHAR, root=0, comm=MPI_COMM_WORLD) failed

MPIDI_Gatherv_intra_composition_alpha(1491):
MPIDI_NM_mpi_gatherv(523)..................:
MPIR_Gatherv_allcomm_linear_ssend(113).....:
MPIC_Ssend(249)............................:
MPID_Ssend(720)............................:
MPIDI_ssend_unsafe(311)....................:
MPIDI_OFI_send_normal(392).................:
(unknown)(): Other MPI error
 

Attachments

Where your rsl files are located on your machine.

Run each one of these commands individually in a new terminal window.


grep -i FATAL rsl.*

grep -i error rsl.*

grep -i SIGSEGV rsl.*

grep -i cfl rsl.

They will show you what file name you need to look for for the errors.
 
Where your rsl files are located on your machine.

Run each one of these commands individually in a new terminal window.


grep -i FATAL rsl.*

grep -i error rsl.*

grep -i SIGSEGV rsl.*

grep -i cfl rsl.

They will show you what file name you need to look for for the errors.
Thank you that was helpful
 
Back
Top