Hello,
I'm running WRF v4.6.1 with two domains (15 km and 3 km), using GFS 0.25° data as input. The model runs successfully with GFS 00z, 06z, and 18z initializations, but consistently crashes only when initialized with GFS 12z.
There is no explicit CFL or physics error in the logs. The last line in rsl.error.* is typically:
d01 YYYY-MM-DD_HH:MM:SS Input data is acceptable to use:
After that, the model stops advancing in simulation time — MPI processes remain active, but no further output is generated. The following MPI error appears:
recv(19) failed: Connection reset by peer (104)
I’m launching the model using:
/usr/bin/mpirun -n 26 --bind-to core --map-by core ./wrf.exe
This issue started occurring consistently since April 28.
From late February until then, I was running 12z GFS initializations without problems.
Since April 28, only a few 12z runs completed successfully.
I’ve already tested multiple changes in namelist.input, including different combinations of PBL, microphysics, and cumulus schemes, but the issue persists.
I will attach namelist.input and example rsl.error.* files.
Has anyone seen similar behavior? Any suggestions for debugging or workarounds would be appreciated.
Thanks in advance!
I'm running WRF v4.6.1 with two domains (15 km and 3 km), using GFS 0.25° data as input. The model runs successfully with GFS 00z, 06z, and 18z initializations, but consistently crashes only when initialized with GFS 12z.
There is no explicit CFL or physics error in the logs. The last line in rsl.error.* is typically:
d01 YYYY-MM-DD_HH:MM:SS Input data is acceptable to use:
After that, the model stops advancing in simulation time — MPI processes remain active, but no further output is generated. The following MPI error appears:
recv(19) failed: Connection reset by peer (104)
I’m launching the model using:
/usr/bin/mpirun -n 26 --bind-to core --map-by core ./wrf.exe
This issue started occurring consistently since April 28.
From late February until then, I was running 12z GFS initializations without problems.
Since April 28, only a few 12z runs completed successfully.
I’ve already tested multiple changes in namelist.input, including different combinations of PBL, microphysics, and cumulus schemes, but the issue persists.
I will attach namelist.input and example rsl.error.* files.
Has anyone seen similar behavior? Any suggestions for debugging or workarounds would be appreciated.
Thanks in advance!