Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

forrtl: error (78): process killed (SIGTERM) after first output time; runwrf.csh with ERA5 initial conditions

kjgillett

New member
Hello all,

I am attempting to run WRFv4.3.1 with ERA5 initial conditions. I am able to get through WPS with no issues and real.exe "completes without issue". However, upon running runwrf.csh, I am only able to get through the first output time for each domain, followed by "forrtl: error (78): process killed (SIGTERM)".

I see no other tell-tale error messages in the rsl.error files after runwrf.csh. However, I re-ran real.exe and examined the rsl.error file and noticed that it doesn't seem to actually get through everything it should for d02 and d03?

More background: I am using ERA5 initial conditions with 3 domains starting at 27000m, with a ratio of /3 each domain, on the UCAR Derecho supercomputer.

Thank you!
 

Attachments

  • error_files.zip
    6.8 KB · Views: 3
Hello all,

I am attempting to run WRFv4.3.1 with ERA5 initial conditions. I am able to get through WPS with no issues and real.exe "completes without issue". However, upon running runwrf.csh, I am only able to get through the first output time for each domain, followed by "forrtl: error (78): process killed (SIGTERM)".

I see no other tell-tale error messages in the rsl.error files after runwrf.csh. However, I re-ran real.exe and examined the rsl.error file and noticed that it doesn't seem to actually get through everything it should for d02 and d03?

More background: I am using ERA5 initial conditions with 3 domains starting at 27000m, with a ratio of /3 each domain, on the UCAR Derecho supercomputer.

Thank you!

Good afternoon,

Couple of suggestions.

1. Add this command before real.exe to see if it's a memory problem
Code:
ulimit -s unlimited

2. I don't remember 100% but I vaguly remember that 4.3.1 may need to have the domain size defined and that the newer version were able to do the calculation for you.

Code:
 dx                                  = 27000, 9000, 3000,
 dy                                  = 27000, 9000, 3000,

3. radt may be too small, best practices usually have it at the 1min per 1km of the parent domain. So all could be
Code:
 radt                                = 27,     27,     27,

4. gwd_opt might need to be the same across all domains not 100% certain

Code:
 gwd_opt                             = 1,      0,      1,
 
 to
 
  gwd_opt                             = 1,      1,      1,

5. Depending what the resolution of ERA5 data is for this older dataset you may not need to use the 27 9 3 domains and could just start at 9km to 3km. The boundary condition can be a 1:3 or 1:5 ratio to the first wrf domain if I remember correctly.

6. namelist file you uploaded in the zip file doesn't have any runtime information
Code:
 run_days                            = 0,
 run_hours                           = 0,
 run_minutes                         = 0,
 run_seconds                         = 0,


Couple of questions

1. how many cores are you using when running real.exe? So mpirun -j #cores ./real.exe?
2. Can you upload all your rsl.error and rsl.out files in a zip file next time you do it? There are a few commands that I know to help identify the issues in the specific filles

Code:
grep -i FATAL rsl.*

grep -i error rsl.*

grep -i SIGSEGV rsl.*

grep -i cfl rsl.*
 
Good afternoon,

Couple of suggestions.

1. Add this command before real.exe to see if it's a memory problem
Code:
ulimit -s unlimited

2. I don't remember 100% but I vaguly remember that 4.3.1 may need to have the domain size defined and that the newer version were able to do the calculation for you.

Code:
 dx                                  = 27000, 9000, 3000,
 dy                                  = 27000, 9000, 3000,

3. radt may be too small, best practices usually have it at the 1min per 1km of the parent domain. So all could be
Code:
 radt                                = 27,     27,     27,

4. gwd_opt might need to be the same across all domains not 100% certain

Code:
 gwd_opt                             = 1,      0,      1,
 
 to
 
  gwd_opt                             = 1,      1,      1,

5. Depending what the resolution of ERA5 data is for this older dataset you may not need to use the 27 9 3 domains and could just start at 9km to 3km. The boundary condition can be a 1:3 or 1:5 ratio to the first wrf domain if I remember correctly.

6. namelist file you uploaded in the zip file doesn't have any runtime information
Code:
 run_days                            = 0,
 run_hours                           = 0,
 run_minutes                         = 0,
 run_seconds                         = 0,


Couple of questions

1. how many cores are you using when running real.exe? So mpirun -j #cores ./real.exe?
2. Can you upload all your rsl.error and rsl.out files in a zip file next time you do it? There are a few commands that I know to help identify the issues in the specific filles

Code:
grep -i FATAL rsl.*

grep -i error rsl.*

grep -i SIGSEGV rsl.*

grep -i cfl rsl.*


Thanks for your reply William!

I made the suggested changes to my namelist.input file with persisting issues (same as described above). Attached is a zip of all of my rsl.* files and the updated namelist.input. I ran into no issues with
Code:
ulimit -s unlimited
 

Attachments

  • error_files.zip
    174 bytes · Views: 2
Top