Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Sudden Increase in Computation Time Over Multiple WRF Simulations

jjsbu

New member
Hi there, I'm running some cases on our cluster with WRF V4.5.2. I noticed that performance reduced suddenly after a several hours of simulation time. I've attached a figure containing scatter plots of the time to compute each step for each run. You'll notice that the computation time is very consistent, before suddenly increasing and becoming much less consistent.

In each case, the added computation time caused me to run out of allocated time on the cluster (the model did not crash). I restarted each run (red dashed) and found that the computation time reduced significantly, back to the original consistent times found in the initial part of the simulation- except in case two, where the timing jump occurred again toward the end of the restart run.

There did not seem to be any notable cause of the computation time increase by or on the simulated fields. Wrfout files were similar in size but writing time increased considerably at the same time as computation. I'm posting this here as I don't know where to begin trouble shooting this issue, perhaps it is related to hardware. I wonder if anyone else has come across this and may know the cause.
 

Attachments

  • comp_time.png
    1,000.6 KB · Views: 4
  • namelist.input.2020.input
    4.6 KB · Views: 1
  • namelist.input.2022.input
    4.6 KB · Views: 1
  • info.input
    286 bytes · Views: 0
  • rsl_out_0000_2020_run1.txt
    933.4 KB · Views: 3
  • rsl_out_2020_restart.txt
    399.2 KB · Views: 0
  • rsl_out_2022_restart.txt
    369.1 KB · Views: 0
  • rsl_out_2022_run1.txt
    488.4 KB · Views: 1
Some physics scheme, for example radiation, takes much longer time than other schemes. This is what I saw in your rsl files and it is normal.
I/O will also take longer time.
 
You may need to implement 'restart' capability of WRF to run your case if the allocated time is not sufficient.
 
Thanks Ming, yes I did restart the simulations and the large times were immediately reduced back to the lengths seen at the start of the initial simulation. In my view, this indicates that the radiation scheme is not the cause (though its large time-to-compute is a symptom). The restart run computed the same time steps in a far smaller time.

I am resolving to either a memory offloading issue or a hardware issue. For now I will have to stop and restart WRF as and when this issue occurs.
 
I am still suspicious that I/O and radiation could be the culprit.

Anyway please keep me updated if you have more information on this issue. Thanks in advance.
 
Top