Usually I am able to detect WRF runtime errors like CFL violations and other errors because the rsl error and out log files will print out a bread crumb trail of what may have gone wrong. If a problem occurs, i can grep for key words like "error" or "fatal" or "cfl" and find the problem in these logs.
It is hard though, when WRF fails silently.
I have a WRF run that is failing at *almost* the same time frame every time I restart it (within about 7 seconds of other previous failed jobs). When it fails, there are no messages in the rsl files that show me what went wrong.
I have been troubleshooting with the CISL helpdesk for the Cheyenne cluster and they don't see anything obviously wrong with the job set up. I have other WRF jobs that have identical name lists except for the time frame that have completed successfully.
Am I overlooking other key words in the rsl files that could point me to the problem? Has anyone else encountered a "silent" WRF fail? Why would WRF not output an error in the log?
WRF version 3.9.1
Thanks for your help!
It is hard though, when WRF fails silently.
I have a WRF run that is failing at *almost* the same time frame every time I restart it (within about 7 seconds of other previous failed jobs). When it fails, there are no messages in the rsl files that show me what went wrong.
I have been troubleshooting with the CISL helpdesk for the Cheyenne cluster and they don't see anything obviously wrong with the job set up. I have other WRF jobs that have identical name lists except for the time frame that have completed successfully.
Am I overlooking other key words in the rsl files that could point me to the problem? Has anyone else encountered a "silent" WRF fail? Why would WRF not output an error in the log?
WRF version 3.9.1
Thanks for your help!