Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

WRF does not go further with no error message

Vera

New member
Hi,

I think I need your help. I have a simulation that was already run with v4.2.1 over 2 years (domain d01 only) and then 7 more months with domain d01 and d02. Domain encompass some mountains (southern part of the Andes). After this 2 years 7 months, I have a CFL error. I reduced the timestep from 60s to 10s, the CFL messages disappeared : actually there is no error message anymore, the simulation did not stop but does not go further. Eventually, the run stop because of the HPC maximum job duration policy.
I send enclosed my namelist and the rsl.error for both runs (60s timestep and 10s timestep).
Please, do you have some suggestions to solve my problem?

Thank you very much for your help
 

Attachments

  • bug_fichiers.zip
    49.9 KB · Views: 4
Hi,
Can you package up all of your rsl* files into a single zipped file - for the run with a 10s time_step, and attach that? Thanks!
 
Hi,
Here comes the files. Thank you very much for your help.
 

Attachments

  • run10s.zip
    252.2 KB · Views: 2
Thanks for sending that!
You said you previously ran this simulation with a different version of WRF. Was everything else the same when you ran that? I.e., same namelist (besides the reduced timestep), same physics options, same input data, dates, domain, etc.?
 
No, sorry, my message was confusing. I always have been using v4.2.1. I just wanted to say that the simulation was done successfully for years 2015 and 2016 with domain d01, and then for the first 6 months of 2017 (always with the same namelist, boundaries etc).
 
Thank you for the clarification. Can you package up your wrfinput*, wrfbday_d01, and wrflowinp* files into a single .tar or zipped file and share those with me? I'd like to try to test using your files. That file will almost certainly be too large to attach, so see the home page of this forum for instructions on sharing large files. Thanks!
 
Hi,
I uploaded the files on the forum cloud. It is called run10s.tar.
Thank you for helping me
 
Thanks for uploading those files. I was able to repeat your issue, but I was also able to get past it with a namelist change. In your namelist, you have
Code:
epssm = 1
First, this value shouldn't be more than 0.9, but it's also a namelist parameter that should be set for both domains. I changed it to
Code:
epssm = 0.9, 0.9
and it was able to run. Can you give that a try and let me know if it helps?

By the way, I also removed the following three settings because they caused problems with my MPI, but it may not be a problem for you.
Code:
 nproc_x     = 8,
 nproc_y     = 10,                           
 numtiles    = 1
 
Hi kwerner,

Thank you for your help. I made the modifications into the namelist, but I still have the issue : after time 2017-08-15_06:04:50 the run does not go further, without any error message :-(

Thank you,
 
The only differences I see between our runs is that I ran the case with 100 processors, and you ran with 80. I also tested this with the latest code (v4.6.0), so I'm not sure if that could be making the difference. Can you try to test with that version and with using 100 processors?

Did you happen to modify any of the code - or are you using pristine (out-of-the-box) code?
Could it be possible that you're running out of disk space in your running directory?
 
Hi Kwerner,

I ran some tests but my issue remains :
- I tested the latest code (v4.6.0)
- I changed the proc number (from 80 to 40, I cannot test with 100 proc because of the HPC policy, the maximum allowed procesor number is 80)
I still have the same issue (WRF does not go further, no error message) in all the tests.

There is disk space and I also tried to rerun a previous period, it works just fine. So, the issue is specifically on day 2017-08-15

Please, do you have some other idea of bug / tests that I could try to do?

Thank you very much
 
Top