I am running the same domain for many individual random days.
In 90% of the runs, WRF completes successfully in a reasonable amount of time on my cluster (128 cores per run).
But in the remaining 10%, the model will run as normal for a random interval of time (sometimes approaching the end of the run) before stalling. It never proceeds to the next time stamp, but is still running, and no errors are present.
Any tips for how I might debug this behavior?
I suspect that something is going screwy with the complex terrain (Rocky Mountains), but that is just a hunch based on combing prior forum posts. My namelist parameters are reasonable for what has worked in those prior cases that caused WRF to crash.
In 90% of the runs, WRF completes successfully in a reasonable amount of time on my cluster (128 cores per run).
But in the remaining 10%, the model will run as normal for a random interval of time (sometimes approaching the end of the run) before stalling. It never proceeds to the next time stamp, but is still running, and no errors are present.
Any tips for how I might debug this behavior?
I suspect that something is going screwy with the complex terrain (Rocky Mountains), but that is just a hunch based on combing prior forum posts. My namelist parameters are reasonable for what has worked in those prior cases that caused WRF to crash.