Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

"Timing for main: time" when run wrf.exe for several hours

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

Hi there, I'm a newbie for WRF model,
I've met an error and I'm tryna find someone can help me get over this one <3
Detail:
When I ran all the things before, it seemed to be OK (WPS, and real.exe), but when I submited wrf.exe on this system, I realzied that there were just 2 wrfout-file (at just start time) for 2 domains.
And then nothing happened !
In rsl.error.000 I can see this thing:
Timing for main: time 2020-07-01_00:00:45 on domain 2: 997.42902 elapsed seconds
Actually, there are more lines than above and I don't know why.
Configuration:
WRF3.7
System: CentOS Linux 7
Extra clue:
The "elapsed time - error" just appeared in rsl.error.0000, and didn't in rsl.error.0001, 0002..., 0007
Actually it took up to 15 mins to run real.exe, it's a long time to run real.exe - i think !
Please help me get over this stuff, any help will be appreciated <3 !
 

Attachments

  • namelist.input
    3.7 KB · Views: 39
  • rsl.error.0001.txt.txt
    8.2 KB · Views: 39
  • rsl.error.0000.txt.txt
    10.5 KB · Views: 45
  • submit_wrf.sh.txt
    381 bytes · Views: 36
Hi,
There is no error in your rsl* files. It just simply stops. Typically the rsl.error.0000 has the most information, so it's normal that you would see more information there. However, the timing information shows:
Code:
Timing for main: time 2020-07-01_00:04:30 on domain   2:  814.10748 elapsed seconds
Timing for main: time 2020-07-01_00:04:30 on domain   1: 4042.27441 elapsed seconds
Timing for main: time 2020-07-01_00:05:15 on domain   2:  812.00732 elapsed seconds
Timing for main: time 2020-07-01_00:06:00 on domain   2:  808.77527 elapsed seconds
Timing for main: time 2020-07-01_00:06:45 on domain   2:  812.11230 elapsed seconds
Timing for main: time 2020-07-01_00:06:45 on domain   1: 4004.40820 elapsed seconds
This means that for some of those time steps, it's taking more than an hour, which is not okay! The model may be stopping because you run out of wall-clock time in your batch submission. Your namelist looks very basic and simple, so I'm not sure this is actually related to WRF, and may be more of a system problem.

1) Just to verify, you did compile WRF with a distributed memory (dmpar) option, correct? You mention that the WRF version is 3.7, but according to your rsl files, it's version 4.0. I just want to make sure you're using the version of code you intend.

2) Did you make any modifications to the code, or is it "out-of-the-box" code? If you made modifications, those mods could be the problem.

If the two point above don't explain the issue, unfortunately, I believe you're going to have to discuss this problem with a systems administrator at your institution, as it's very likely related to your specific system and/or environment. If you do get it figured out, please let us know what the problem/solution was, so that it may help future users.
 
Top