Severe CFL Instabilities at 40km Domain (Brazil) on HPC Cluster: Seeking Balance Between Stability and Processing Speed

jcoutinho

New member
Hi everyone,

I am currently running a WRF simulation over a large domain covering Brazil to study hydro-climatological impacts and forest cover changes. I am experiencing a persistent numerical instability issue (Segmentation Faults due to v_cfl and w_cfl violations) unless I drop my time step down to a very conservative 60 seconds, which is drastically slowing down my processing throughput.
I am looking for advice on how to optimize my namelist.input physics and dynamics settings so I can safely increase the time step (ideally to 120s or 180s) to improve processing time, while successfully outputting daily precipitation variables.

Here are the details of my setup:
1. Domain Configuration
Resolution (DX, DY): 40 km (40000 \text{ m})
Grid Size: e_we = 315, e_sn = 275, e_vert = 45
Target Window: 5-year simulation (Currently benchmarking January 1985)
2. Current Physics Suite (&physics)
Microphysics (mp_physics): 3 (WSM3)
Longwave/Shortwave Radiation (ra_lw_physics, ra_sw_physics): 4 (RRTMG) / 4 (RRTMG)
Surface Layer (sf_sfclay_physics): 1 (MM5)
Land Surface (sf_surface_physics): 2 (Noah LSM)
PBL Scheme (bl_pbl_physics): 1 (YSU)
Cumulus Scheme (cu_physics): 16 (Grell-Freitas)
3. Current Dynamics Settings (&dynamics)
hybrid_opt = 2
w_damping = 1
diff_6th_opt = 2, diff_6th_factor = 0.12
4. HPC / Cluster Environment
Running via Slurm on an AMD-based cluster (Hopper).
Using 8 nodes (256 total cores).

The Dilemma: At time_step = 240, the model crashes within 12 minutes. At time_step = 180, it crashes within 51 minutes with w-cfl values peaking at 5.51. It only runs stably at time_step = 60, which makes a 5-year simulation computationally unfeasible on my current allocation.

My Questions:
Stability vs. Speed: For a 40km domain over South America (incorporating both the Andes boundary noise and intense tropical convection), what combinations in &dynamics can help me run at a time_step of 120s or 180s without triggering segmentation faults? Should I implement Rayleigh damping (damp_opt = 3), Gravity Wave Drag (gwd_opt = 1), or acoustic smoothing (epssm = 0.1)?
Daily Precipitation Output: I want to output clean daily accumulated precipitation to validate against observational data (like CHIRPS). Because WRF variables RAINC and RAINNC are cumulative, what is the cleanest way to configure the namelist to reset or isolate daily buckets? Should I use precip_buckets_add or configure specific auxiliary history streams (auxhist)?

Thanks in advance,
Jaqueline
 
Try first epssm = 0.9. Also try to reduce number of cores to 128, for such domain size it looks to me a bit too much, maybe something fishy is going on with such small domain patches after decomposition.

If still fails, try to determine where the CFL are violated. Is it mountain area, domain boundary zone, or something else ...
 
Back
Top