spacekace3005
New member
I am attempting to run three separate WRF runs, all of which are unsuccessful and throw the same errors: "rank XXXX died from signal 11" and "rank XXXX died from signal 15." A quick search reveals these are segmentation faults, with signal 11 indicating an MPI segmentation fault. Screenshots of these errors for each of the three runs are attached. The rsl.error.* and rsl.out.* files do not contain any information about why the runs failed. In the past, if there was an issue with the grid decomposition due to the number of processors used, it would explicitly tell me in these files. There is no information provided this time, so it doesn't seem like this is the error.
The program produces wrfout files for the first timestep for each domain before exiting. I am using the same namelist settings (with slightly modified grid locations - no change in grid size) and requesting the same number of nodes/processors (#PBS -l select=40:ncpus=128:mpiprocs=128) as in a previous case that ran successfully. The only difference is that the previous case was run using WRFv4.6.1, and I am now attempting to use WRFv4.7.1 instead. The newest version of WRF was compiled exactly the same way as the older version using the gfortran compilers. The meteorological data were processed using WPSv4.6.0. Namelist.input files for each of the four cases are attached for reference. real.exe runs without any issues.
All four runs and associated rsl.* files can be found in my scratch folder on Derecho:
1. Successful 10 Aug 2020 run: /glade/derecho/scratch/kshourd/10Aug2020/2nd500mRUN-RESTART-test/WRF/test/em_real (the wrfout files were moved to another folder for post-processing)
2. Unsuccessful 10 Aug 2020 precursor run: /glade/derecho/scratch/kshourd/10Aug2020/precursor/WRF/test/em_real
3. Unsuccessful 12 May 2022 run: /glade/derecho/scratch/kshourd/12May2022/WRF_4.7.1mc/WRF/test/em_real
4. Unsuccessful 12 May 2022 precursor run: /glade/derecho/scratch/kshourd/12May2022/WRF_4.7.1pre/WRF/test/em_real
Thank you in advance for any help!
The program produces wrfout files for the first timestep for each domain before exiting. I am using the same namelist settings (with slightly modified grid locations - no change in grid size) and requesting the same number of nodes/processors (#PBS -l select=40:ncpus=128:mpiprocs=128) as in a previous case that ran successfully. The only difference is that the previous case was run using WRFv4.6.1, and I am now attempting to use WRFv4.7.1 instead. The newest version of WRF was compiled exactly the same way as the older version using the gfortran compilers. The meteorological data were processed using WPSv4.6.0. Namelist.input files for each of the four cases are attached for reference. real.exe runs without any issues.
All four runs and associated rsl.* files can be found in my scratch folder on Derecho:
1. Successful 10 Aug 2020 run: /glade/derecho/scratch/kshourd/10Aug2020/2nd500mRUN-RESTART-test/WRF/test/em_real (the wrfout files were moved to another folder for post-processing)
2. Unsuccessful 10 Aug 2020 precursor run: /glade/derecho/scratch/kshourd/10Aug2020/precursor/WRF/test/em_real
3. Unsuccessful 12 May 2022 run: /glade/derecho/scratch/kshourd/12May2022/WRF_4.7.1mc/WRF/test/em_real
4. Unsuccessful 12 May 2022 precursor run: /glade/derecho/scratch/kshourd/12May2022/WRF_4.7.1pre/WRF/test/em_real
Thank you in advance for any help!
Attachments
-
SUCCESSFUL_namelist1.input4.3 KB · Views: 0
-
UNSUCCESSFUL_namelist2.input4.3 KB · Views: 0
-
UNSUCCESSFUL_namelist3.input4.1 KB · Views: 1
-
UNSUCCESSFUL_namelist4.input4.1 KB · Views: 0
-
10Aug2020prec_Screenshot 2025-10-28 143752.png25.8 KB · Views: 0 -
12May2022mc_Screenshot 2025-10-28 143354.png36 KB · Views: 0 -
12May2022prec_Screenshot 2025-10-28 143651.png36.2 KB · Views: 0
Last edited: