Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

forrtl: error (78) and Timestep

Henry18

New member
Hi,

I'm running WRF and repeatedly encounter model crashes (before walltime expires). The rsl.error.* files show the same error on either the initial run or a restart:

forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
libc.so.6 000014557F256900 Unknown Unknown Unknown
wrf.exe 00000000025DAAF8 Unknown Unknown Unknown
wrf.exe 00000000025D52CD Unknown Unknown Unknown
wrf.exe 0000000001D8FF5A Unknown Unknown Unknown
wrf.exe 0000000001F790C9 Unknown Unknown Unknown
wrf.exe 00000000017292FB Unknown Unknown Unknown
wrf.exe 00000000014FBAE8 Unknown Unknown Unknown
wrf.exe 00000000005B97B3 Unknown Unknown Unknown
wrf.exe 00000000004174B1 Unknown Unknown Unknown
wrf.exe 0000000000417471 Unknown Unknown Unknown
wrf.exe 000000000041740D Unknown Unknown Unknown
libc.so.6 000014557F23FE6C Unknown Unknown Unknown
libc.so.6 000014557F23FF35 __libc_start_main Unknown Unknown
wrf.exe 000000000041733A Unknown Unknown Unknown

I found a forum thread suggesting that reducing the timestep can resolve this error (forrtl: error (78): process killed (SIGTERM)). Although I haven't observed CFL warnings, I tried reducing the timestep: for a 7-day run I started at 18 and then reduced it on a restart at day 4 — testing 12, 9, and finally 6.

My questions: 1. Is it common practice in WRF to gradually reduce the timestep during a long simulation or across restarts? 2. For long-term simulations with multiple restarts, should I expect to need very small timesteps eventually (e.g., 3 s or 1 s)?

For reference, I’ve attached two rsl.error.* files. My working directory is: /glade/derecho/scratch/hhou/Test_ERA5/WRF/test/em_real

Thank you for your help!
Henry
 

Attachments

  • rsl.error.00001.txt
    5.4 KB · Views: 1
  • rsl.error.00002.txt
    977.1 KB · Views: 0
Bash:
grep -i FATAL rsl.*

grep -i error rsl.*

grep -i SIGSEGV rsl.*

grep -i cfl rsl.*

run these commands in the /run folder that has all the rsl.out and rsl.error files and see if it comes back with anything.

Then upload those files here in a zip file
 
Bash:
grep -i FATAL rsl.*

grep -i error rsl.*

grep -i SIGSEGV rsl.*

grep -i cfl rsl.*

run these commands in the /run folder that has all the rsl.out and rsl.error files and see if it comes back with anything.

Then upload those files here in a zip file
Hi William,

Thank you for the reply! I ran the diagnostic commands in my WRF run directory. These two returned lots of information:

grep -i FATAL rsl.*

grep -i error rsl.*

I have attached the output of the two commands in .txt files (named according to the respective command).

For context: I submitted a new simulation last night with the following settings, and it has now been running successfully for more than 4 hours:

#PBS -l select=16:ncpus=36:mpiprocs=36:mem=64GB

And the time_step = 6.

Because this new run is progressing so far, I’m not certain whether the log files I checked (and attached) exactly correspond to the previous crash, or if the issue has been inadvertently resolved by restarting with a clean environment.

Thank you!
Henry
 

Attachments

  • error message.zip
    45.5 KB · Views: 1
Top