Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

WRFv4.7.1 wrf.exe (on Derecho) exits with signal errors during first time step

spacekace3005

New member
I am attempting to run three separate WRF runs, all of which are unsuccessful and throw the same errors: "rank XXXX died from signal 11" and "rank XXXX died from signal 15." A quick search reveals these are segmentation faults, with signal 11 indicating an MPI segmentation fault. Screenshots of these errors for each of the three runs are attached. The rsl.error.* and rsl.out.* files do not contain any information about why the runs failed. In the past, if there was an issue with the grid decomposition due to the number of processors used, it would explicitly tell me in these files. There is no information provided this time, so it doesn't seem like this is the error.

The program produces wrfout files for the first timestep for each domain before exiting. I am using the same namelist settings (with slightly modified grid locations - no change in grid size) and requesting the same number of nodes/processors (#PBS -l select=40:ncpus=128:mpiprocs=128) as in a previous case that ran successfully. The only difference is that the previous case was run using WRFv4.6.1, and I am now attempting to use WRFv4.7.1 instead. The newest version of WRF was compiled exactly the same way as the older version using the gfortran compilers. The meteorological data were processed using WPSv4.6.0. Namelist.input files for each of the four cases are attached for reference. real.exe runs without any issues.

All four runs and associated rsl.* files can be found in my scratch folder on Derecho:
1. Successful 10 Aug 2020 run: /glade/derecho/scratch/kshourd/10Aug2020/2nd500mRUN-RESTART-test/WRF/test/em_real (the wrfout files were moved to another folder for post-processing)
2. Unsuccessful 10 Aug 2020 precursor run: /glade/derecho/scratch/kshourd/10Aug2020/precursor/WRF/test/em_real
3. Unsuccessful 12 May 2022 run: /glade/derecho/scratch/kshourd/12May2022/WRF_4.7.1mc/WRF/test/em_real
4. Unsuccessful 12 May 2022 precursor run: /glade/derecho/scratch/kshourd/12May2022/WRF_4.7.1pre/WRF/test/em_real

Thank you in advance for any help!
 

Attachments

  • SUCCESSFUL_namelist1.input
    4.3 KB · Views: 0
  • UNSUCCESSFUL_namelist2.input
    4.3 KB · Views: 1
  • UNSUCCESSFUL_namelist3.input
    4.1 KB · Views: 2
  • UNSUCCESSFUL_namelist4.input
    4.1 KB · Views: 1
  • 10Aug2020prec_Screenshot 2025-10-28 143752.png
    10Aug2020prec_Screenshot 2025-10-28 143752.png
    25.8 KB · Views: 0
  • 12May2022mc_Screenshot 2025-10-28 143354.png
    12May2022mc_Screenshot 2025-10-28 143354.png
    36 KB · Views: 0
  • 12May2022prec_Screenshot 2025-10-28 143651.png
    12May2022prec_Screenshot 2025-10-28 143651.png
    36.2 KB · Views: 0
Last edited:
Hi,
I looked at the rsl* files in one of your "unsuccessful" directories (specifically in
I was able to find several CFL errors, like the following:

Code:
rsl.error.3658:d02 2020-08-09_12:00:06            7  points exceeded v_cfl = 2 in domain d02 at time 2020-08-09_12:00:06 hours
rsl.error.3658:d02 2020-08-09_12:00:06 Max   W:    623   1935      4 W:  118.01  w-cfl:    4.26  dETA:    0.01
rsl.error.3658:d02 2020-08-09_12:00:06            3  points exceeded v_cfl = 2 in domain d02 at time 2020-08-09_12:00:06 hours
rsl.error.3658:d02 2020-08-09_12:00:06 Max   W:    623   1935      3 W: -153.85  w-cfl:    3.36  dETA:    0.01
rsl.error.3658:d02 2020-08-09_12:00:06          318  points exceeded v_cfl = 2 in domain d02 at time 2020-08-09_12:00:06 hours

These errors indicate model instability. See Segmentation Faults and CFL Errors for details.
 
Hi @kwerner,

Thanks for pointing this out. I was already using many of the suggested fixes for CFL errors, and have attempted the additional recommendations with no success. I am receiving the same CFL errors in the same location (e.g., @ rsl.*.3658 for the 10 Aug 2020 precursor case). What I find odd is that I have successfully run models with this domain in the past for a period of extremely intense convection (the 10 Aug 2020 derecho). The period I am running now is just 12 hours earlier, when less intense convection was present. Theoretically, this model should be more stable than for the actual derecho-producing convection?

I am re-running this case again right now with one last-ditch effort, but I am at a loss for next steps.

While struggling with these three models, I have also made additional attempts at fresh models (including new WRF installs) with slightly different domain locations/sizes. For these, I am unable to even get through ungrib.exe, which is also odd. I am processing the same RAP data I have processed numerous times before. I am able to successfully run ungrib.exe for extracting the GFS soil data, but now I receive segmentation faults immediately when processing the RAP data. Again, I have processed this EXACT data before without issue. Should I open a new thread with this issue? I was hoping to resolve the CFL errors so I wouldn't need to run new models, but I also need results soon.

The new WPS attempts are located here:
/glade/derecho/scratch/kshourd/12May2022/NEW
/glade/derecho/scratch/kshourd/12May2022/NEW_PRECURSOR

EDIT: I have raised the ungrib segfault issue in a related thread here: WPS ungrib seg fault
 
Last edited:
Thank you for posting a new thread about the WPS/ungrib issue! That helps us to keep things clean and easy to search/read.

As for the CFL issue - did your last test run, or did you run into the same issue? If you're still having issues, point me to the directory where you're running (if there are multiple simulations, let's just focus on one at a time), if you don't mind. Thanks!
 
Unfortunately, the CFL issues are still present. Let's focus on the 10 Aug 2020 case since I was able to run a successful simulation with the same data during a more convectively active period. Here's the folder: /glade/derecho/scratch/kshourd/10Aug2020/precursor/WRF/test/em_real

Thanks!
 
Top