Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Real.exe stuck on particular met_em file, no errors

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

gabriel_bromley

New member
I am trying to run a 12km 'test' run of WRF using ERA5 input data. I have moved all of the grib files over from HPSS and ungrib as well as metgrid ran without errors. However, the real.exe program is not completing. It runs fine until a particular met_em file, and then cheyenne kills the job before real.exe finishes. I gave it the full 12 hours to run, and it still doesn't complete. I cannot find any errors in any of the output files to point me in a direction of fixing this. Attached are my rsl* files, my namelist, and the job output from cheyenne. Thanks for any insight!
 

Attachments

  • namelist.input
    113.7 KB · Views: 99
  • rsl_files.tar.gz
    22.8 MB · Views: 66
  • real_gbromley.job.txt
    7 KB · Views: 64
Hi,
Have you tried to run real.exe for just the time period on which it is stopping? For example, your run seems to be hanging up 2013-05-03_12:00:00. So you could run real.exe from 2013-05-03_00 to 2013-05-04_00 to see if it still stops in the same place. If it does, then it seems to indicate a problem with the data. If not, then it seems to indicate a problem with the environment. Let me know - thanks!
 
I took your advice and ran real for a shorter time period and it succeeded. I ran real for just a day and then also for 2 months (centered around the problem date) and they both ran fine. The 2 month real run took less than an hour so I am going to try the full time period again and see if it magically works.
 
Real.exe fails again when run starting from the beginning. I can definitely run shorter chunks of time for now, but it would be great to be able to run the entire year. Any thoughts on diagnosing this problem?
 
One more test (if you haven't done so already) is to see if it always stops after a certain number of process hours (regardless of start time). For instance, if you started running real from March 1, instead, would it run to the beginning of July?

It's possibly this is a disk space problem. You can check your available disk space. If that seems to be the problem, if you have another directory available with more space, you could try to direct the output there, or simply to run there. If that's not the problem, or an option, I would recommend reaching out to the Cheyenne support team at CISL to see if they have any ideas on an environment setting that may be causing the problem.
 
Hi Kelly,

I took your advice, the problem does seem to follow the movement of the start date, indicating its not a data issue. I have plenty of space on cheyenne, so I reached out to CISL to figure this problem out. In the meantime, is it possible to restart real.exe at a certain time step so that it completes the entire simulation period?

Thanks,

Gabe
 
Gabe,
Unfortunately there is not a restart option for the real program; however, you can run real and output wrfbdy/wrfinput files for chunks of time, and then run wrf, making sure to create a wrfrst file for the time that your next wrfbdy time starts, and then run wrf with restarts. The workflow would be (e.g., running 2 weeks)
run real.exe for week 1 - output is wrfinput_d0* and wrfbdy_d01 (either rename or store these elsewhere to prevent being overwritten)
run real.exe for week 2 - output is same as above (again, rename or store elsewhere)
run wrf.exe for 1 week, using the real.exe output from week 1, set restart_interval to output wrfrst_d0* files at the final time
move real.exe output for week 2 into running directory
run wrf.exe as a restart, using the wrfbdy_d01 file for week2, along with the wrfrst_d0* file(s) for the initial time - that is also the ending time of week 1.
 
This is great, thank you! Would it be advantageous to save the wrfinput files after the model progresses past that period? I am hoping to optimize space. real.exe is pretty cheap to run so my thought would be that deleting the files would be fine. Thanks!
 
It should be okay to delete them. As you said, if you had to run real.exe again, it wouldn't be that time-consuming, so it shouldn't be a big deal, especially if it's saving space for you!
 
Top