Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

WRF freezes, keeps running but it doesn't write the output

cross

Member
I've been having random errors where WRF process keeps running, but it doesn't write the output data (wrfout, wrfrst, rsl). The only way to fix it is to change the timestep or change the microphysics parametrization (ie. from 6 to 3).
This happens randomly, different servers, different configurations, different years, different compiltations. So I don't know how to figure out what could be the problem.
Any idea what could be causing this or what to check?
 

Ming Chen

Moderator
Staff member
This could be either a computer issue or something is wrong with the physics. Please take a look at your RSL fils to find possible error message.
By the way, which version of WRF are you running? Can you upload a namelist.input file ( the one that doesn't work) for me to take a look?
 

cross

Member
It has happened in two different computers, using WRF 4.3.3 and 4.4.
Now I will start running a new model, if it freezes again I'll upload the namelist.input.
The rsl files just stop writing, doesn't show any error.
I thought it could be some memory or disk errors, but seems that everything it's ok
I compiled with intel compilers, but never had a problem with them.
 

cross

Member
I've made some tests. And the problem seems to be the intel compiler or intel MPI.
At the same time it freezes, I get CFL error using gcc and OpenMPI
 

Ming Chen

Moderator
Staff member
CFL errors indicate the model is numerically instable. Can you upload your namelist.input and namelist.wps for me to take a look?
Is your model domain located at high-topography area?
 

cross

Member
Yes, is over the Andes. My problem is not getting CFL errors, I have to deal with that in this area, but the problem was I wasn't getting an error message and the process was still running.
I compiled WRF with intel compiler + OpenMPI, and now I get an error message. So, I think I might be having problem with intel MPI.
I've had problems with these versions:
Intel(R) MPI Library for Linux* OS, Version 2021.3 Build 20210601
Intel(R) MPI Library for Linux* OS, Version 2021.6 Build 20220227
 

Ming Chen

Moderator
Staff member
Depending on various machines, sometimes the job stopped running and just hanged there. I guess this is the case for you.
Please kill the job, solve the CFL violation, then try again.
You can try to :
(1) reduce time step
(2) increase epssm from thee default value (0.2) to larger value such as 0.7
(3) turn on w_daming
 

cross

Member
Depending on various machines, sometimes the job stopped running and just hanged there. I guess this is the case for you.
Please kill the job, solve the CFL violation, then try again.
You can try to :
(1) reduce time step
(2) increase epssm from thee default value (0.2) to larger value such as 0.7
(3) turn on w_daming
I tried to do that, but it didn't work.
I was using SMS-3DTKE, so I changed it for the Shin-Hong scheme and worked, now it's running more stable.
 

deva_wrf

Member
Hey, I am facing the same problem, i am using WRF 4.0 for my study when i am using GFS as initial condition after running for some time step it gets freeze i dont get whats the problem i have, i already run the model with same list for different initial condition.
 
D

Deleted member 3607

Guest
Hey, I am facing the same problem, i am using WRF 4.0 for my study when i am using GFS as initial condition after running for some time step it gets freeze i dont get whats the problem i have, i already run the model with same list for different initial condition.
I've made some tests. And the problem seems to be the intel compiler or intel MPI.
At the same time it freezes, I get CFL error using gcc and OpenMPI


I have had seen similar problems with Intel and have helped some people with it on these forums. Maybe that will help you. But without the namelist files and log files of the runs I cant be sure you are facing the same problems.


 

Ming Chen

Moderator
Staff member
Would you please post your nameless.input that used 3DTKE (and thus the failed case) for me to take a look? Thanks.
 
Top