Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

WRF v4 reanalysis runs dying

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

Can you provide more information about those failed cases? Any error messages, any weird values in wrfout, when did the model crash, etc.?
 
I'm afraid I don't have any new information for you. The runs always die without giving any kind of error message. The job keeps running though.

Now I am unable to reliably reproduce a crash. Sometimes a restarted run will die at the same point, sometimes not. Normally this might indicate some sort of hardware issue. But I don't think that's the case because it only happens with a certain physics setting, and not at a history dump time. It's also been reproduced on a different system.

To recap:
  • This problem happens only with sf_sfclay_physics=1, and it always happens with that setting. It never happens if it is set to something else.
  • The runs always die without giving any error message. The job keep running but does not produce any further output.
  • This problem occurs in 3.9.1, 4.0, and 4.0.1.
  • These are long-running nested grid runs with 3 grids. The failures occur after 5 to 30 days of integration.
  • The crashes happen between history dumps, so it does not seem to be an I/O issue.

I wish I could give you more to go on.
 
I will talk to our experts about this issue. I will get back to you once we come up with some ideas what is wrong.
 
Top