Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

RIBX never exceeds RIC error in WRF4.4.2

dbpartha

New member
Hi all,

I know this is a commonly encountered and discussed issue in this forum but I haven't found a solution to this issue yet after much reading (related namelists, runscripts and error logs are attached).

What I am trying to do: Run WRF4.4.2 for 2023/07 and 2023/08 for 12, 4 and 1.33 km using PX-LSM (soil moisture turned on for first 10 days spin-up, will turn off on final run) and then use the outputs as inputs for WRF4.4.2-CMAQv5.5 coupled simulation.

What I have done so far:
1. Used relatively older WPS to create met_em files using NARR.
2. Used the met_em files, upper obs (ds351.0) and surface obs (ds461.0) data in obsgrid to create metoa_em* and wrfsfdda_d0* files for all 3 domains.
3. Using the metoa_em* and wrfsfdda_d0* files in real.exe to create boundary condition files (wrfbdy_d01, wrffdda_d0*, wrfinput_d0*, wrflowinp_d0*).
4. Using the boundary condition files to run WRF4.4.2.

The error:
d03 2023-06-21_00:00:06+02/03 in ACM PBL

-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE: <stdin> LINE: 415
RIBX never exceeds RIC, RIB(i,kte) = 0.00000000 THETAV(i,1) = NaN MOL= NaN TCONV = 0.229080185 WST = NaN KMIX = 35 UST = NaN TST = NaN U,V = NaN NaN I,J= 217 14


Trials to solve the issue:

1. Reduced and increased the time steps (doesn't work)
2. Turned off/commented out:

  • ! sf_surface_mosaic = 1,
  • ! mosaic_cat = 8,
3. Reduced the nudging coefficients:
  • guv = 0.0003, 0.0001, 0.0000,
  • gt = 0.0003, 0.0001, 0.0000,
  • gq = 0.00001, 0.00001, 0.0000,
Important additional notes:
1. I have used the same metoa_em* and wrfsfdda_d0* files for all 3 domains to create boundary condition files in WRF3.8 real.exe and found that the BC files in WRF3.8 are much larger than those of WRF4.4.2 (see attached screenshots). WRF3.8 with basically the same namelist.input is running without any issues.
2. When I create metoa_em* files and run real.exe, the
grid%tsk unreasonable error shows up in both WRF versions. As a workaround step, I have renamed the SKINTEMP variable in metoa_em* files as TSK and used them in for the WRF runs in 4.4.2 and 3.8 (3.8 wrf run works).
3. My concern (and possibly the reason of this issue) is why BC files in WRF4.4.2 would be are much smaller than those of WRF3.8 when created with the same metoa_em* and wrfsfdda_d0* files? To check if my WRF4.4.2 run directory is somehow missing something, I re-compiled a fresh directory of WRF4.4.2 but BC sizes are still smaller than WRF38 BC.

4. WRF4.4.2 ran for one simulation day (2023/06/21) and stopped at 2023/06/22.

Kindly help me fix this issue. Mentioning @kwerner @Ming Chen for their wisdom on this.
 

Attachments

  • Screen Shot 2025-03-09 at 5.33.01 PM.png
    Screen Shot 2025-03-09 at 5.33.01 PM.png
    187.1 KB · Views: 3
  • Screen Shot 2025-03-09 at 5.33.44 PM.png
    Screen Shot 2025-03-09 at 5.33.44 PM.png
    169.9 KB · Views: 3
  • rsl.errors.zip
    2.8 MB · Views: 1
  • namelist.input.txt
    7.5 KB · Views: 4
  • namelist.oa.txt
    2.9 KB · Views: 1
  • namelist.wps.txt
    2.1 KB · Views: 1
  • WRF4.4.2.runscript.txt
    2.6 KB · Views: 2
Hi,

I believe the difference in the size of the boundary files is due to the fact that newer WRF performs compression, making the files smaller. If you take a look at the data in the wrfbdy file, and it all looks okay, it's probably safe to use.

Since you are doing so many different things with this simulation, these are some tests you could do to try to narrow down what exactly is causing the issue:

1) If you just ran WPS and WRF, without using the extra OBSGRID files, does this still fail with the same error?
2) Same as 1), but try using WPSV4.6.0 and WRFv4.6.1.
3) If you use a different physics option (i.e., not PX-LSM), does that make a difference? This specific physics option has a reputation for being very finicky.
4) Try running just a single domain. If that runs okay, try 2 domains - to figure out which domain causes the issue.
5) Try setting the following:
guv = 0.0001,
gt = 0.0001,
 
Hi @kwerner thanks for the suggestions. Soon after posting this issue, I found out that newer WRF/WPS uses NetCDF compression that was causing the size differences of IC/BC files (no issues with the IC/BC files themselves). I was testing different options (that included your suggestions as well) and found,

1. without using the extra OBSGRID files, I don't face this issue. (as long as SKINTEMP is renamed to TSK in met_em files, otherwise, grid%tsk unreasonable error shows up in real.exe, so all the NARR trials here included the step of renaming SKINTEMP to TSK)
2. Used WPSV4.6.0 and WRFv4.6.1., but the issue persists as long as I use NARR 2023 June, July and August data.
3. non-PX-LSM option doesn't cause this issue.
4. Haven't tried single domain cases but RIBX never exceeds RIC error always showed up at d03 (1.33km).

The problem (in my opinion) generated from changing the SKINTEMP to TSK in met_em files. So I switched to 6-hourly NAM 12 km analysis data for creating met_em files in WPS4.6.0 but ultimately faced the RIBX never exceeds RIC error again.

5. So I have tested different physics options and copied below one is working so far with NAM (tried different physics options with NARR with SKINTEMP renamed, doesn't work):
  • grid_fdda = 1, 1, 1,
  • grid_sfdda = 1, 1, 0,
  • pxlsm_soil_nudge = 1, 1, 0,
  • guv = 0.0003, 0.0001,
  • gt = 0.0003, 0.0001,
  • gq = 0.00001, 0.00001,
 
Hi again,

The above physics options ran wrf for 12-13 hours for the whole first day of simulation (2023/06/21) and crashed with the same “RIBX never exceeds…” error right after it stepped into the second day (06/22) at domain 3 at 6th simulation second at d03 2023-06-22_00:00:06+02/03 in ACM PBL. I have tried reducing and even commenting out the nudging coefficients but it fails at different simulation times. Since I am simulating summer months and one previous alumni of my group has used different nudging coefficients for different seasons, does this sound like a possible cause? I may have seen somewhere in the forum that people had difficulties in simulating summer months in newer WRF model. What could be the possible solution in that case if I need to simulate summer months using new WRF (for example v4.5.1)?
 
Since the simulation works okay without the nudging files, it seems to be related to that input. The error message gives an i,j location. Check the nudging files to see if you're missing data in that location, or if the data seem corrupted or unreasonable.
 
Hi, thanks for the reply. As we are going through a scheduled HPC maintenance, I am not able to test if the nudging files are corrupted or not. But I don't think the nudging files are corrupted or have missing data as they were used to run WRF3.8 and its running without any issues (even after SKINTEMP was renamed to TSK in NARR met files). I am guessing something might be wrong when running real.exe in WRF4.5.1. Is it possible to encounter this issue if the WPS and WRF are not compiled with the same versions/type of netcdf and other modules (jasper, zlib, png etc)?

WPS compilation by our system administrator used netcdf-combined (netcdf-c-4.7.0-gcc9.1.0 and netcdf-fortran-4.4.5-gcc9.1.0) with hdf5 compression but in WRF4.5.1 compilation I used netCDF/4.3.3.1 without netcdf & hdf5 compression. This was mainly done to match the netcdf decompression ability of WRF3.8 (as it is running) which used the new WPS (compiled with netcdf-combined) met files.

To simplify:
1. WPS (uses netcdf-combined with compression) --> OBSGRID (compiled with netCDF/4.3.3.1 and ifort) --> WRF3.8 (uses netCDF/4.3.3.1 without compression) --> 2023 summer simulation running.
1. WPS (uses netcdf-combined with compression) --> OBSGRID (compiled with netCDF/4.3.3.1 and ifort) --> WRF4.5.1 (uses netCDF/4.3.3.1 without compression) --> 2023 summer simulation NOT running; error: RIBX never exceeds RIC.
 
Apologies for the long delay in response while our team tended to time-sensitive obligations. Thank you for your patience.

Is it possible to encounter this issue if the WPS and WRF are not compiled with the same versions/type of netcdf and other modules (jasper, zlib, png etc)?
This error message shouldn't be generated due to any differences in versions. It's likely related to physics code modifications over the years.

You mentioned before that you don't get any error if you don't use PX LSM. Would that be a reasonable option for you - to use a different LSM?
 
Top