Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

WRF process terminated badly; aborted

spangrude

New member
Hello,

I am using slurm to submit WRF runs to a cluster. My first model run works fine. Next, I use essentially the same namelist options, except for a different date, run time, and lat/long coordinates, but this second run always stops mysteriously after 55 model seconds. I get one of two errors from the slurmjob.log, with no seeming pattern to which one I get. Either a seg fault- I do see some CFL errors in the rsl files- but I wouldn't normally expect to see this so quickly in the model time, and would also expect that to happen consistently, or I get an error that says "bad termination of one of your application processes"

I am using version 4.3.3.
Attached is my namelist.input and the error logs.

Does anyone know what the issue might be?

Thank you!
 

Attachments

  • error.tar.gz
    61.5 KB · Views: 1
  • namelist.input
    4.1 KB · Views: 2
  • slurmjob.log
    2.9 KB · Views: 4
Last edited:
Hi,
The reasoning for this not being consistent is odd and could potentially be system/environment-related. However, the fact that you do have CFL errors indicates that is likely the reason for the issue. Those can happen at any point during the simulation, so even though it was near the beginning of the run, the errors are still happening. It could be possible that this particular domain is causing more issues than the previous one. I would advise trying to reduce the time_step down to maybe 4xDX to see if that gets you past the issue. If not, take a look at this FAQ that discusses some other potential solutions for CFL errors.
 
Top