Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

lowest levels in metgrid over complex terrain

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

peter_

Member
Hi:
My data for initial and boundary conditions are FNL 0.25deg x 0.25deg. WRF 4.2 is aborting shortly after startup (please see below). I am also including my namelist.
Code:
d02 2019-09-11_00:00:40  DEBUG wrf_timetoa():  returning with str = [2019-09-11_00:00:40]
d02 2019-09-11_00:00:40  call radiation_driver
d02 2019-09-11_00:00:40 Top of Radiation Driver
d02 2019-09-11_00:00:40 calling inc/HALO_PWP_inline.inc
d02 2019-09-11_00:00:40  call surface_driver
d02 2019-09-11_00:00:40 in SFCLAY

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x2aaaabc5333f in ???
#1  0x271f2dd in ???
#2  0x2722a46 in ???
#3  0x2727c52 in ???
#4  0x1f0c1ba in ???
#5  0x17bab87 in ???
#6  0x12f9901 in ???
#7  0x11b7ae4 in ???
#8  0x4735fa in ???
#9  0x473bda in ???
#10  0x406213 in ???
#11  0x405bcc in ???
#12  0x2aaaabc3f494 in ???
#13  0x405c03 in ???
#14  0xffffffffffffffff in ???

Code:
&time_control
run_days                 = 0,
run_hours                = 36,
run_minutes              = 0,
run_seconds              = 0,
start_year               = 2019,     2019,
start_month              = 9,        9,
start_day                = 11,       11,
start_hour               = 0,        0,
start_minute             = 00,       00,
start_second             = 00,       00,
end_year                 = 2019,     2019,
end_month                = 9,        9,
end_day                  = 12,       12,
end_hour                 = 12,       12,
end_minute               = 00,       00,
end_second               = 00,       00,
interval_seconds         = 21600,
input_from_file          = .true.,   .true.,
history_interval         = 180,       6,
history_outname          = "/scratch/peter/wrfout_d<domain>_<date>"
frames_per_outfile       = 1000,     110,
restart                  = .false.,
restart_interval         = 5000,
io_form_history          = 2,
io_form_restart          = 2,
io_form_input            = 2,
io_form_boundary         = 2,
debug_level              = 1000,
/

&domains
time_step                = 30,
time_step_fract_num      = 0,
time_step_fract_den      = 1,
max_dom                  = 2,
e_we                     = 364,      697,
e_sn                     = 382,      574,
e_vert                   = 100,      100,
p_top_requested          = 100,
num_metgrid_levels       = 34,
num_metgrid_soil_levels  = 4,
dx                       = 9000,     3000,
dy                       = 9000,     3000,
grid_id                  = 1,        2,
parent_id                = 1,        1,
i_parent_start           = 1,       46,
j_parent_start           = 1,       82,
parent_grid_ratio        = 1,        3,
parent_time_step_ratio   = 1,        3,
feedback                 = 1,
smooth_option            = 0,
max_dz                   = 600.,
auto_levels_opt          = 2,
zap_close_levels         = 0.1,
lagrange_order           = 1,
smooth_cg_topo           = .true.,
/

&physics
mp_physics               = 3,        3,
ra_lw_physics            = 1,        1,
ra_sw_physics            = 1,        1,
radt                     = 30,       30,
sf_sfclay_physics        = 1,        1,
sf_surface_physics       = 2,        2,
bl_pbl_physics           = 1,        1,
bldt                     = 0,        0,
cu_physics               = 0,        0,
cudt                     = 5,        5,
isfflx                   = 1,
ifsnow                   = 0,
icloud                   = 1,
surface_input_source     = 1,
num_soil_layers          = 4,
sf_urban_physics         = 0,        0,
maxiens                  = 1,
maxens                   = 3,
maxens2                  = 3,
maxens3                  = 16,
ensdim                   = 144,
/

&fdda
/

&dynamics
w_damping                = 0,
diff_opt                 = 1,
km_opt                   = 4,
diff_6th_opt             = 0,        0,
diff_6th_factor          = 0.12,     0.12,
base_temp                = 290.,
iso_temp                 = 200.,
base_pres_strat          = 1300,
base_lapse_strat         = -40.,
damp_opt                 = 3,
zdamp                    = 10000.,   10000.,
dampcoef                 = 0.2,      0.2,
khdif                    = 0,        0,
kvdif                    = 0,        0,
non_hydrostatic          = .true.,   .true.,
moist_adv_opt            = 1,        1,
scalar_adv_opt           = 1,        1,
/

&bdy_control
spec_bdy_width           = 5,
spec_zone                = 1,
relax_zone               = 4,
specified                = .true.,  .false.,
nested                   = .false.,   .true.,
/

&grib2
/

&namelist_quilt
nio_tasks_per_group      = 0,
nio_groups               = 1,
/
I had a look at the variable GHT in my met_em.* files. Over complex terrain areas there is an inconsistency in the lowest two levels: the second level stays in several areas below the first level. I was told that with ERA5 forcing data I should only use 4-point interpolators for GHT in the metgrid table and it worked fine. However, this procedure failed with NCEP FNL analysis. A priori real.exe seems to fix this problem as in wrfinput the pressure/geopotential height against level number show a decreasing/increasing behaviour for increasing lower levels. However, wrf.exe makes a few initial steps and then crashes. I wonder if I should verify for integrity some additional variables in wrfinput or wrfbdy to see if the problem in the lowest levels of met_em.* becomes somehow transferred to the initial and boundary conditions. It is remarkable that SFCLAY is the last subroutine before the fault, as it deals with the lowest levels. Any suggestion about how to solve this metgrid issue ?
 
I have moved this topic from the metgrid section, to the wrf.exe section of the forum. As this likely is not a metgrid problem, the first step should be to determine the cause of the error in wrf.exe. Someone will respond to your inquiry soon.
 
I have a few questions about this case:
(1) Would you please check your FNL quarter degree data and make sure it is high enough for your case? I notice that you set the model top to 1 mb, which is pretty high.
(2) If you run this case with lower model top, e.g., 50hPa, with reduced number of vertical levels, can the model run to success?
(3) If you run this case with ERA5 as input, can the model run successfully?
(4) if the case failed immediately, it often indicates either the input data is wrong, or the memory is not enough. Please make sure you have sufficient memory to run this case.
 
Thank you for the reply. Please find below my answers.
(1) Please see https://rda.ucar.edu/datasets/ds083.3/#!description. All mandatory input data are available up to 1 mb. I already was able to run a similar case, but now I introduced zap_close_levels to ensure that the model uses the forcing data at the highest levels and this modification apparently leads to the crash.
(2) Yes. However, apparently there is a delicate balance between the number of levels and the value of zap_close_levels to reach a successful run. May be some specific information is needed regarding this issue from wrfhelp.
(3) ERA5 can go even higher and I had problems so therefore I moved to FNL data which have a lower top. At least up to 30 mb I was able to run WRF with ERA5 forcing data.
(4) I have been using 100 cores within 4 nodes with 125MB memory each one. Does it look insufficient for my namelist ?
Sorry to ask again: is real.exe able to fix inconsistent GHT levels in met_em.* and avoid erroneous initial data ? Best regards
 
Peter,

1. A model lid of 1 mb is pretty high up, but that should not pose troubles.
2. Unrelated to the model blowing up, with a model lid about 50 mb, use the RRTMG radiation schemes. There is no ozone in option #1. You could get a decent temperature bias aloft.
3. Your definition of the stratosphere is not the standard atmosphere, but again, we put those switches there for people to use.
4. The 100 cores is probably reasonable. I doubt that memory is a reason for the model failing. A simple test is to just run with max_dom=1. If it still fails, we know memory is not the issue.
5. For the output from metgrid, when the data is isobaric (as FNL probably is), the k=1 level is the surface, and k=2 through N are the isobaric surfaces (1000, 950, 900, etc hPa up through 1 hPa). It is not unusual in areas over topography that the surface level height (k=1) is larger than the k=2 1000 hPa height value (somewhere under the ground). This is not a concern. For ERA5, the data could have been on the native model levels, which uses a different vertical storage method.
6. Other than telling you what was the last routine WRF ran in the physics, the debug switch is not much help. Go ahead and set it back to zero so that you can more easily interpret the output from subsequent runs.

Here is a typical strategy that we would employ since the model dies quickly. Build the code to run without optimization, and also to give you some additional debug information. This is done by doing the
Code:
./clean -a
step, and then
Code:
./configure -d
to set up the compilation options (-d means debug). Do a single domain run: it will be MUCH faster, based on your setup, about 10x faster than both domains. One of the rsl files should give you a some traceback info (I look for the largest rsl files). We want to know what is causing the troubles (is this a floating error, is it a seg fault). The traceback will tell us this, and at least one of the rsl files will refer to a line number in the offending *.f90 file (the file that was compiled). Remember to look up that line number in the mentioned *.f90 file, and then edit the associated *.F file to make any changes. The *f90 files are intermediate, discarded, and overwritten upon a rebuild.

There are a couple of ways to go after this type of "probably an IC problem": put in some print statements around the offending line to track down an (i,j), look at the initial conditions for the fields that are of concern to look for any discrepancies.

Let's iterate after you get a line number and some leads on what the possible errors are. Model runs that die within a few time steps are relatively easy to diagnose. Given your info about using ERA5 and now FNL, and your specific settings in the namelist, you seem to know your way around the model. This type of problem is likely what you have identified: something happening with the initial condition data by the real program. However, it is usually easiest to let the model find the problem with the data (it did find the troubles, and it died). Now we find out where in the model it died, where in the grid it died (location, field), and then fix that IC data with some mods or namelist options in real.
 
Top