"Timing for main: time" when run wrf.exe for several hours

Topics specifically related to the wrf.exe program
Post Reply
PhungQuangNghia3108
Posts: 4
Joined: Wed Mar 31, 2021 8:57 am

"Timing for main: time" when run wrf.exe for several hours

Post by PhungQuangNghia3108 » Mon Apr 12, 2021 7:11 am

Hi there, I'm a newbie for WRF model,
I've met an error and I'm tryna find someone can help me get over this one <3
Detail:
When I ran all the things before, it seemed to be OK (WPS, and real.exe), but when I submited wrf.exe on this system, I realzied that there were just 2 wrfout-file (at just start time) for 2 domains.
And then nothing happened !
In rsl.error.000 I can see this thing:
Timing for main: time 2020-07-01_00:00:45 on domain 2: 997.42902 elapsed seconds
Actually, there are more lines than above and I don't know why.
Configuration:
WRF3.7
System: CentOS Linux 7
Extra clue:
The "elapsed time - error" just appeared in rsl.error.0000, and didn't in rsl.error.0001, 0002..., 0007
Actually it took up to 15 mins to run real.exe, it's a long time to run real.exe - i think !
Please help me get over this stuff, any help will be appreciated <3 !
Attachments
submit_wrf.sh.txt
(381 Bytes) Downloaded 4 times
rsl.error.0000.txt.txt
(10.48 KiB) Downloaded 8 times
rsl.error.0001.txt.txt
(8.19 KiB) Downloaded 5 times
namelist.input
(3.75 KiB) Downloaded 4 times

kwerner
Posts: 2287
Joined: Wed Feb 14, 2018 9:21 pm

Re: "Timing for main: time" when run wrf.exe for several hours

Post by kwerner » Mon Apr 12, 2021 7:31 pm

Hi,
There is no error in your rsl* files. It just simply stops. Typically the rsl.error.0000 has the most information, so it's normal that you would see more information there. However, the timing information shows:

Code: Select all

Timing for main: time 2020-07-01_00:04:30 on domain   2:  814.10748 elapsed seconds
Timing for main: time 2020-07-01_00:04:30 on domain   1: 4042.27441 elapsed seconds
Timing for main: time 2020-07-01_00:05:15 on domain   2:  812.00732 elapsed seconds
Timing for main: time 2020-07-01_00:06:00 on domain   2:  808.77527 elapsed seconds
Timing for main: time 2020-07-01_00:06:45 on domain   2:  812.11230 elapsed seconds
Timing for main: time 2020-07-01_00:06:45 on domain   1: 4004.40820 elapsed seconds
This means that for some of those time steps, it's taking more than an hour, which is not okay! The model may be stopping because you run out of wall-clock time in your batch submission. Your namelist looks very basic and simple, so I'm not sure this is actually related to WRF, and may be more of a system problem.

1) Just to verify, you did compile WRF with a distributed memory (dmpar) option, correct? You mention that the WRF version is 3.7, but according to your rsl files, it's version 4.0. I just want to make sure you're using the version of code you intend.

2) Did you make any modifications to the code, or is it "out-of-the-box" code? If you made modifications, those mods could be the problem.

If the two point above don't explain the issue, unfortunately, I believe you're going to have to discuss this problem with a systems administrator at your institution, as it's very likely related to your specific system and/or environment. If you do get it figured out, please let us know what the problem/solution was, so that it may help future users.
NCAR/MMM

Post Reply

Return to “wrf.exe”