wrf.exe crashing at initialization before writing 1st wrfout

Topics specifically related to the wrf.exe program
Post Reply
frediani
Posts: 2
Joined: Tue Apr 20, 2021 4:56 am

wrf.exe crashing at initialization before writing 1st wrfout

Post by frediani » Wed Apr 21, 2021 12:05 am

Hi,

I'm trying to run a specific case study but I can't get this simulation to finish writing the initialization files. I've tried a few different model versions, compilers, and CPUs. The model doesn't integrate through any timestep, it just crashes as it tries to write the 1st wrfout.

With 16 processors from 2 CPUs (8+8) I get this error:

Code: Select all

forrtl: severe (408): fort: (2): Subscript #4 of the array SCALAR has value 2 which is greater than the upper bound of 1
Image              PC                Routine            Line        Source             
wrf.exe            000000000CF3DBCF  Unknown               Unknown  Unknown
wrf.exe            0000000001097AE9  force_domain_em_p       11088  module_dm.f90
With 12 processors from 1 CPU, I get this:

Code: Select all

forrtl: error (73): floating divide by zero
Image              PC                Routine            Line        Source             
wrf.exe            000000000CF465FB  Unknown               Unknown  Unknown
libpthread.so.0    00002B33D88A4B00  Unknown               Unknown  Unknown
wrf.exe            00000000091E72C3  module_diffusion_        6963  module_diffusion_em.f90
The specified line number 6963 in module_diffusion refers to this statement:

Code: Select all

rdzw(i,k,j) = 1.0 / ( z_at_w(i,k+1,j) - z_at_w(i,k,j) )
I calculated z_at_w and rdzw from wrfinput_d01 and wrfinput_d02 and they look ok. You'll find the netcdf files with these variables in the tarball for the real.exe files.

The source code for the attached test runs is the release-v4.2.2 (1e93b7e3) compiled with intel/19.1.1 and impi on Cheyenne.

I'm including the relevant files from real.exe, and from wrf.exe for the tests with 12 and 16 CPUs. The tarball names with "-D" correspond to executables compiled in debug mode with -D, and "noD" corresponds to the standard compilation.

I also tried release-v4.0.1 and release-v4.0.3, compiled with gnu and intel but I'm not including the tests files for these.

The input data to WPS is from HRRR v3 at pressure levels, downloaded from http://hrrr.chpc.utah.edu/. Let me know if you'd like to see the WPS files.

I'd really appreciate any tip on how to identify the issue. Thank you so much!
Attachments
real_4.2.2_intel1911_noD.tgz
(149.01 MiB) Downloaded 1 time
wrf_4.2.2_intel1911-noD_cpu1x12.tgz
(18.17 MiB) Downloaded 2 times
wrf_4.2.2_intel1911-D-orig_cpu2x8.tgz
(72.31 MiB) Downloaded 2 times
wrf_4.2.2_intel1911-D-orig_cpu1x12.tgz
(72.19 MiB) Downloaded 2 times

kwerner
Posts: 2287
Joined: Wed Feb 14, 2018 9:21 pm

Re: wrf.exe crashing at initialization before writing 1st wrfout

Post by kwerner » Thu Apr 22, 2021 8:54 pm

*Updated*

Hi,
Thank you for providing the files. I ran a couple of tests using your namelist and wrfbdy/wrfinput* files. The first test was everything "as-is" (i.e., using your namelist exactly as it's set up). As expected, the simulation failed immediately. The second test I ran was for a single domain only. I wanted to see if the problem was specifically with d02, and the test ran without any problems. I can't say for sure, but I feel fairly confident the problem is related to your parent_grid_ratio (and parent_time_step_ratio). You are currently using a 9:1 ratio, which is a pretty large difference. We typically recommend using a 3:1 or 5:1 grid ratio, but never more than 7:1. I'm curious if adding an additional nest between your parent and fine grid would make a difference.

Additionally, there are a few things you should modify in your namelist.
1) debug_level. Set this to 0. This is something that was originally added to the namelist for testing purposes, but has recently been removed from default namelists because it is typically pretty useless and only adds a lot of junk to your rsl files, making them gigantic and difficult to read through.
2) radt. This should be set to the same value for each domain, and should be ~1 min per thousand km grid spacing. Since your d01 resolution is 1 km, you should set this to radt = 1, 1 (or =1, 1, 1 - if you're adding a 3rd domain).
3) diff_opt. You should set this to the same value for all domains (diff_opt = 2, 2)
4) km_opt. You should set this to the same value for all domains (km_opt = 2, 2)

I would also recommend using more processors than 12 or 16. For your domain size, you would be perfectly safe to use 36. As for domain set-up, you can refer to this page for recommended practices for the namelist.wps parameters.
NCAR/MMM

Post Reply

Return to “wrf.exe”