Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

WRF Restart Run Is Getting Stuck and Not Proceeding for Irrigation Options

ahmedbably

New member
Hi WRF community,

I have a simulation that I have always been able to do restart runs for, it consisted of 2 scenarios: Control and Irrigation.
The irrigation run only utilizes existing irrigation options in the WRF namelist.
I have managed to do restart runs for both scenarios before using certain physics parameterization.
I have found out that those parameters are not the best for my case study, so I changed them a little keeping the land surface module as before which is the NOAH one.
However, now the following happens:
The control run works fine, and doing restart runs on it works very fine.
However, the irrigation run is not doing the restart as expected. It's getting stuck when I do the first restart run after at stops in the 9th day of simulation.
It keeps reading the line "
d01 2006-09-09_00:00:00 Input data is acceptable to use: wrfrst_d02_2006-09-09_00:00:00
" without any action even though the job status is "Running''.
This is weird! And never happened before. I contact the HPC support and they said nothing changed from their side and I even copied the same WRF installtion from COntrol run to irrigation run and it just added the irrigation static files and options in namelist as before but it also doesn't do the restart run!!!

I'm using WRF V 4.4 running it on HPC from Cyprus Institute.
I have attached the namelist of both the control run and irrigation run as well as the RSL files for both error and out files (there are no errors which is insane! It's just stuck reading a file!!), so that's why i need your help kindly!

Also, if there's no apparent solution to this problem, what is an alternative to running a WRF restart run?
Is setting the time to the cutoff time in the namelist, re-running real.exe then WRF and so on until it finishes equivalent to restarting the run or not? And if not do you have an altertarive to running restart runs??
 

Attachments

  • rsl.error.0000
    3.3 KB · Views: 1
  • rsl.out.0000
    3.4 KB · Views: 2
  • namelist (irrigation).input
    5.4 KB · Views: 3
  • namelist (Control).input
    5 KB · Views: 2
Hi,
Apologies for the long delay in response. Before we try to address this issue, I should first note that your domain sizes are entirely too small. The settings for e_we and e_sn should never be any smaller than 100x100 grid spaces to obtain reasonable results. Although you've previously simulated domains of that size, we ask that you increase your domain sizes.

After you increase the size, if you are still getting this same issue, then please share your new namelist files, as well as your rsl* files (please package all of them together and share that as a single *.tar file). If you get a different issue, please let me know here and then create a new thread with the new problem.
 
Hi,
Apologies for the long delay in response. Before we try to address this issue, I should first note that your domain sizes are entirely too small. The settings for e_we and e_sn should never be any smaller than 100x100 grid spaces to obtain reasonable results. Although you've previously simulated domains of that size, we ask that you increase your domain sizes.

After you increase the size, if you are still getting this same issue, then please share your new namelist files, as well as your rsl* files (please package all of them together and share that as a single *.tar file). If you get a different issue, please let me know here and then create a new thread with the new problem.
Hi, Kwerner,

Thank you for your reply.
I unfortunately don't have the time advantage now to change domains but I will try once more.
I just need to know what kind of impact will this small domain have on the results (if it's too drastic or minimal), also
like I stated, I have always run on that domain restart runs without a problem so I'm guessing I will run into the same problem again even if I changed the domains as you recommended, because there has to be another cause probably.

Let's assume that the current domain is acceptable and I still have the restart problem, I was wondering if I could override this problem by another solution that you know of?
Could I for example save the wrfout files that stopped because of walltime somehwere, then re-run the real.exe to generate new inputs that start from the end time of the previous run and then re-run the wrf.exe program starting from that time and so on until I'm done?
Since wrf restart capability isn't working.
 
Hi,
This question is asked often, so I went ahead and created an FAQ about it. Take a look at the explanation in Why domains should be at least 100x100 grid spaces.
Thank you kwerner,

But please could answer the other half of the question,

"

Let's assume that the current domain is acceptable and I still have the restart getting stuck and not running problem, I was wondering if I could override this problem by another solution that you know of?
Could I for example save the wrfout files that stopped because of walltime in a folder somewhere, then re-run the real.exe to generate new wrfinput and wrfbdy files that start from the end time of the previous run and then re-run the wrf.exe program starting from that time and so on until I'm done?
Since wrf restart capability isn't working."
 
Apologies for not answering that part of the question. The reason is because we can't assume the current domain size is okay. It needs to be larger.

However, if your domains were big enough, and you still had issues like this, and were unable to overcome them, you could do this, but it would not be the same simulation. Because each time_step's results influence the next time_step's results, you would get somewhat different results if you started a new simulation, using initial conditions at a new time. I'm not sure how different it would be. You could always run a test using times within the time the model runs okay to see the differences in results. For e.g., say the model runs okay for 2 days. You could run a 2 day simulation, then you could try running a test, using all new boundary/initial conditions for the final 6 hours of the simulation time, and then compare the output of both of those simulations at the 48 hour time to see how different they are, and how comfortable you are using results with those differences. If they aren't very different, this method could possibly be okay for a shorter run, but if you're ultimately running for many weeks/months, it's probably not ideal.
 
Top