Abnormal slowdown after WRF restart run

peng

Member
I encountered an abnormal slowdown when restarting WRF from a wrfrst file.This is a single-domain, one-way nested WRF simulation using ndown, with 100 m resolution and adaptive time stepping.
In the continuous run, after the adaptive timestep reaches dt = 1.20 s, the model runs normally:
“Timing for main (dt= 1.20): time 2026-02-18_00:00:22 on domain 1: 1.29544 elapsed seconds
Timing for main (dt= 1.20): time 2026-02-18_00:00:23 on domain 1: 1.26584 elapsed seconds
Timing for main (dt= 1.20): time 2026-02-18_00:00:24 on domain 1: 1.25132 elapsed seconds”

The output interval is 30 minutes, and the wall-clock time is also about 30 minutes per 30-minute simulation output.
However, after stopping the run and restarting from:wrfrst_d01_2026-02-20_12:00:00,the model becomes much slower, although the timestep is still dt = 1.20 s:
"Timing for main (dt= 1.20): time 2026-02-20_12:00:01 on domain 1: 6.13815 elapsed seconds
Timing for main (dt= 1.20): time 2026-02-20_12:00:02 on domain 1: 5.52845 elapsed seconds
Timing for main (dt= 1.20): time 2026-02-20_12:00:03 on domain 1: 5.57714 elapsed seconds"

So for the same dt = 1.20 s, the continuous run takes about 1.2–1.3 s per step, while the restart run takes about 5.5–6.1 s per step. After restart, it takes nearly 2 hours of wall-clock time to produce 30 minutes of simulation output.
I have checked that:
1. The number of MPI processes is the same.
2. The namelist settings are the same.
3. The timestep is the same after adaptive time stepping reaches dt = 1.20 s.
4. The previous WRF job was stopped before restarting.
5. Similar behavior has also been observed in other simulations and on other servers.
The restart file appears to be read correctly:
LBC for restart: Found the correct bounding LBC time periods for restart time = 2026-02-20_12:00:00.
Any suggestions would be appreciated.
 
I confirm that the simulation was restarted using a **wrfrst** file. Under normal circumstances, the results should not differ from running the simulation continuously. However, I am wondering whether the extended runtime will affect the simulation results.
 
Hi, Can you let me know at which stage of your ndown simulation this is happening (for e.g., is this a restart of the coarse or fine domain)? Will you please provide the namelists you use to do this, as well as your rsl.* files? Please package all rsl files into a single *.tar or zipped file and attach that. Thanks!
 
These are the namelist and rsl.error files from my direct run (WRF), as well as the restart files (wrfrst). My model is WRF version 4.4. I am currently simulating the official run after nesting down from d02 (500 m) to d03 (100 m), which is a restart of the fine-resolution domain. Since the previous log files were overwritten, these are the logs from the restart I just performed, with a certain degree of reproducibility. Do you have any idea where the problem might be?
 

Attachments

I'm sorry that I cannot provide all the directly generated RSL files, as they have been overwritten. I have checked that there is no other redundant extra information within the restart time range, and everything is running normally.
 
Thanks for providing those. It would be helpful if I can see the namelist you use for the non-restart run, to compare to the restart namelist. To get around the issue where your rsl files are overwritten, run a very short (e.g., 6-12 hours) non-restart simulation, then package the resulting rsl files into a zipped file. Then do the same for the restart simulation - afterward, zipping those rsl files, using a different file name than the one you used when zipping the non-restart rsls. Then please attach both zipped files so that I can compare the rsls for both runs.
 
Back
Top