Hi there,
I am attempting to run real.exe on derecho for a 3-layer nested domain simulation (namelist attached below); but real.exe seems to be failing on the 3rd domain with the following error;
This is my config for running real.exe
After running into the issue with namelist.old (please ignore the start and end times) and having looked at another similar issue before, I tried increasing domain 1 grids (namelist_new) but still running into the same issue. Before I play around with other domain 1 grids (domain 3 is entirely the area of interest so was avoiding reducing that), was curious if anyone had any pointers on how to approach this issue? Any other insights would be helpful as well.
Thanks in advance,
Ananya
I am attempting to run real.exe on derecho for a 3-layer nested domain simulation (namelist attached below); but real.exe seems to be failing on the 3rd domain with the following error;
Code:
cxil_map: write error
cxil_map: write error
cxil_map: write error
cxil_map: write error
MPICH ERROR [Rank 0] [job id b00ca03d-a489-44f4-ae0c-620377b3aa0d] [Fri Mar 21 23:26:58 2025] [dec0037] - Abort(539613199) (rank 0 in comm 0): Fatal error in PMPI_Scatterv: Other MPI error, error stack:
PMPI_Scatterv(416)..........: MPI_Scatterv(sbuf=0x148af85d7020, scnts=0xa0429c0, displs=0xfd322d0, MPI_CHAR, rbuf=0x7fff3a921200, rcount=17799936, MPI_CHAR, root=0, comm=comm=0xc4000000) failed
MPIR_CRAY_Scatterv(462).....:
MPIC_Isend(511).............:
MPID_Isend_coll(610)........:
MPIDI_isend_coll_unsafe(176):
MPIDI_OFI_send_normal(372)..: OFI tagged senddata failed (ofi_send.h:372:MPIDI_OFI_send_normal:Invalid argument)
aborting job:
Fatal error in PMPI_Scatterv: Other MPI error, error stack:
PMPI_Scatterv(416)..........: MPI_Scatterv(sbuf=0x148af85d7020, scnts=0xa0429c0, displs=0xfd322d0, MPI_CHAR, rbuf=0x7fff3a921200, rcount=17799936, MPI_CHAR, root=0, comm=comm=0xc4000000) failed
MPIR_CRAY_Scatterv(462).....:
MPIC_Isend(511).............:
MPID_Isend_coll(610)........:
MPIDI_isend_coll_unsafe(176):
MPIDI_OFI_send_normal(372)..: OFI tagged senddata failed (ofi_send.h:372:MPIDI_OFI_send_normal:Invalid argument)
This is my config for running real.exe
Code:
...
#PBS -l select=4:ncpus=96:mem=128GB
...
export WRFIO_NCD_LARGE_FILE_SUPPORT=1
mpirun -np 156 ./real.exe
After running into the issue with namelist.old (please ignore the start and end times) and having looked at another similar issue before, I tried increasing domain 1 grids (namelist_new) but still running into the same issue. Before I play around with other domain 1 grids (domain 3 is entirely the area of interest so was avoiding reducing that), was curious if anyone had any pointers on how to approach this issue? Any other insights would be helpful as well.
Thanks in advance,
Ananya