Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

WRF V3.9 on FreeBSD

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

rpasken

New member
I have been working getting WRF V3.9 running on FreeBSD. I am using WRFV3.9 because I have not qualified a V4 version for use in my real-time forcasts. I have so far fixed all the coding errors in geogrid, ungrib and metgrid. All of which were character array length errors in date/time strings In some places they were declared to be 132 characters long, 12 in others and 19 in other. Setting the string lengths to 19 characters allows geogrid, ungrib and metgrid to produce met_em files. I looked at the met_em files and they appear to match the met_em files created under Centos-7 I have compiled wrf and real successfully and real runs and produces the wrfinput and wrbdy files. When I run wrf.exe, wrf fails with the following in the rsl.error.0000 file
taskid: 0 hostname: swan.localdomain
module_io_quilt_old.F 2931 F
Quilting with 1 groups of 0 I/O tasks.
Ntasks in X 1 , ntasks in Y 1
--- WARNING: traj_opt is zero, but num_traj is not zero; setting num_traj to zero.
--- NOTE: sst_update is 0, setting io_form_auxinput4 = 0 and auxinput4_interval = 0 for all domains
--- NOTE: grid_fdda is 0 for domain 1, setting gfdda interval and ending time to 0 for that domain.
--- NOTE: both grid_sfdda and pxlsm_soil_nudge are 0 for domain 1, setting sgfdda interval and ending time to 0 for that domain.
--- NOTE: obs_nudge_opt is 0 for domain 1, setting obs nudging interval and ending time to 0 for that domain.
--- NOTE: bl_pbl_physics /= 4, implies mfshconv must be 0, resetting
Need MYNN PBL for icloud_bl = 1, resetting to 0
--- NOTE: RRTMG radiation is not used, setting: o3input=0 to avoid data pre-processing
*************************************
No physics suite selected.
Physics options will be used directly from the namelist.
*************************************
--- NOTE: num_soil_layers has been set to 4
WRF V3.9.1.1 MODEL
*************************************
Parent domain
ids,ide,jds,jde 1 205 1 171
ims,ime,jms,jme -4 210 -4 176
ips,ipe,jps,jpe 1 205 1 171
*************************************
DYNAMICS OPTION: Eulerian Mass Coordinate
alloc_space_field: domain 1 , 725245588 bytes allocated
wrf main: calling open_r_dataset for wrfinput
med_initialdata_input: calling input_input
mminlu = 'MODIFIED_IGBP_MODIS_NOAH'
Timing for processing wrfinput file (stream 0) for domain 1: 0.56921 elapsed seconds
Max map factor in domain 1 = 1.00. Scale the dt in the model accordingly.
WRF TILE 1 IS 1 IE 205 JS 1 JE 171
set_tiles3: NUMBER OF TILES = 1
INPUT LandUse = "MODIFIED_IGBP_MODIS_NOAH"
LANDUSE TYPE = "MODIFIED_IGBP_MODIS_NOAH" FOUND 33 CATEGORIES 2 SEASONS WATER CATEGORY = 17 SNOW CATEGORY = 15
INITIALIZE THREE Noah LSM RELATED TABLES
Skipping over LUTYPE = USGS
LANDUSE TYPE = MODIFIED_IGBP_MODIS_NOAH FOUND 20 CATEGORIES
INPUT SOIL TEXTURE CLASSIFICATION = STAS
SOIL TEXTURE CLASSIFICATION = STAS FOUND 19 CATEGORIES
calling inc/HALO_EM_INIT_1_inline.inc
calling inc/HALO_EM_INIT_2_inline.inc
calling inc/HALO_EM_INIT_3_inline.inc
calling inc/HALO_EM_INIT_4_inline.inc
calling inc/HALO_EM_INIT_5_inline.inc
calling inc/PERIOD_BDY_EM_INIT_inline.inc
calling inc/PERIOD_BDY_EM_MOIST_inline.inc
calling inc/PERIOD_BDY_EM_TKE_inline.inc
calling inc/PERIOD_BDY_EM_SCALAR_inline.inc
calling inc/PERIOD_BDY_EM_CHEM_inline.inc
calling inc/HALO_EM_INIT_1_inline.inc
calling inc/HALO_EM_INIT_2_inline.inc
calling inc/HALO_EM_INIT_3_inline.inc
calling inc/HALO_EM_INIT_4_inline.inc
calling inc/HALO_EM_INIT_5_inline.inc
calling inc/PERIOD_BDY_EM_INIT_inline.inc
calling inc/PERIOD_BDY_EM_MOIST_inline.inc
calling inc/PERIOD_BDY_EM_TKE_inline.inc
calling inc/PERIOD_BDY_EM_SCALAR_inline.inc
calling inc/PERIOD_BDY_EM_CHEM_inline.inc
d01 2021-12-01_18:00:00 open_hist_w : opening wrfout_d01_2021-12-01_18:00:00 for writing.
d01 2021-12-01_18:00:00 Information: NOFILL being set for writing to wrfout_d01_2021-12-01_18:00:00
d01 2021-12-01_18:00:00 med_hist_out: opened wrfout_d01_2021-12-01_18:00:00 as DATASET=HISTORY
Timing for Writing wrfout_d01_2021-12-01_18:00:00 for domain 1: 1.01156 elapsed seconds
d01 2021-12-01_18:00:00 mminlu = 'MODIFIED_IGBP_MODIS_NOAH'
d01 2021-12-01_18:00:00 Warning LEN CHAR STRING > LEN DATA in ext_ncd_get_var_td.code CHAR, line 189
-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE: module_date_time.G LINE: 911
WRFU_TimeSet() in wrf_atotime() FAILED Routine returned error code = -1
-------------------------------------------
Abort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0


I am looking for suggestions for where to look. Yes I ran gdb on wrf.exe and traced it into a WRFU reference
 
Did this case failed immediately after wrf.exe started? How did you compile WRF and what input data did you ungrib to drive the WRF run? Please send me your namelist.input to take a look. Thanks.
 
The namelist is below:
&time_control
run_days = 0,
run_hours = 1,
run_minutes = 0,
run_seconds = 0,
start_year = 2021, 2021,
start_month = 12, 12,
start_day = 01, 01,
start_hour = 18, 18,
start_minute = 00, 00,
start_second = 00, 00,
end_year = 2021, 2021,
end_month = 12, 12,
end_day = 02, 02,
end_hour = 00, 00,
end_minute = 00, 00,
end_second = 00, 00,
interval_seconds = 10800,
input_from_file = .true., .true.,
history_interval = 15, 15,
frames_per_outfile = 49, 49,
restart = .false.,
restart_interval = 5000,
io_form_history = 2,
io_form_restart = 2,
io_form_input = 2,
io_form_boundary = 2,
debug_level = 9,
auxinput11_interval_s = 360,
auxinput11_end_h = 120,
io_form_auxhist2 = 2,
/

&domains
time_step = 18,
time_step_fract_num = 0,
time_step_fract_den = 1,
max_dom = 1,
e_we = 205, 109,
e_sn = 171, 127,
e_vert = 35, 35,
sfcp_to_sfcp = .true.
p_top_requested = 5000,
num_metgrid_levels = 41,
num_metgrid_soil_levels = 9,
dx = 3000, 1000,
dy = 3000, 1000,
grid_id = 1, 2,
parent_id = 1, 1,
i_parent_start = 1, 148,
j_parent_start = 1, 78,
parent_grid_ratio = 1, 3,
parent_time_step_ratio = 1, 3,
feedback = 1,
smooth_option = 0,
use_adaptive_time_step = .true..
step_to_output_time = .true.
target_cfl = 1.0
max_step_increase_pct = 20
starting_time_step = -1,
max_time_step = -1,
min_time_step = -1,
adaptation_domain = 1
/

&afwa
afwa_diag_opt = 1, 1,
afwa_ptype_opt = 1, 1,
afwa_radar_opt = 1, 1,
afwa_severe_opt = 1, 1,
afwa_icing_opt = 1, 1,
afwa_vis_opt = 1, 1,
afwa_cloud_opt = 1, 1,
/


&physics
mp_physics = 6, 6,
ra_lw_physics = 1, 1,
ra_sw_physics = 1, 1,
radt = 10, 10,
sf_sfclay_physics = 1, 1,
sf_surface_physics = 2, 2,
bl_pbl_physics = 1, 1,
bldt = 0, 0,
cu_physics = 0, 0,
cudt = 5, 5,
isfflx = 1,
ifsnow = 0,
icloud = 1,
surface_input_source = 1,
num_soil_layers = 4,
sf_urban_physics = 0, 0,
maxiens = 1,
maxens = 3,
maxens2 = 3,
maxens3 = 16,
ensdim = 144,
/

&dynamics
w_damping = 0,
diff_opt = 1,
km_opt = 4,
diff_6th_opt = 0, 0,
diff_6th_factor = 0.12, 0.12,
base_temp = 290.,
damp_opt = 0,
zdamp = 5000., 5000.,
dampcoef = 0.2, 0.2,
khdif = 0, 0,
kvdif = 0, 0,
non_hydrostatic = .true., .true.,
moist_adv_opt = 1, 1,
scalar_adv_opt = 1, 1,
/

&bdy_control
spec_bdy_width = 5,
spec_zone = 1,
relax_zone = 4,
specified = .true., .false.,
nested = .false., .true.,
/

&grib2
/

&namelist_quilt
nio_tasks_per_group = 0,
nio_groups = 1,
/


This is the identical namelist.input that is currently be used on the production centos-7 cluster for WRFV3.9. As you can see in the previously attached rsl.error.0000 file the initial conditions have been written (Timing for Writing wrfout_d01_2021-12-01_18:00:00 for domain 1: 1.01156 elapsed seconds) it then fails with (d01 2021-12-01_18:00:00 Warning LEN CHAR STRING > LEN DATA in ext_ncd_get_var_td.code CHAR, line 189)
 
Your namelist.input looks fine. You set num_metgrid_soil_levels = 9, can you tell me what data you used to drive this case?
I am suspicious this is a data issue, since the model crashed immediately after wrf.exe started.

Let's try the following:
(1) turn off adaptive time step, --- I don't think this is the reason for the case failure, but it will make the issue less complicated
(2) use a different dataset to run the same case, for example, try GFS quarter degree data,

We would like to first make sure this is not a data issue.
 
The data set I used is a copy of the data set that ran successfully on a Centos-7 and Centos-8 system. The compilers are at the same version ie., the Centos-8 system is using gcc-8.5 and so is the FreeBSD system I will turn off adaptive time stepping and rerun
 
The attachment is a plot of temperature via ncl of wrfout_d01_2021-12-01_18:-00:00
 

Attachments

  • plot_temp.png
    plot_temp.png
    165.1 KB · Views: 545
If the data work fine in a different machine, can we say that the issue you have at present is machine-related?
 
That was the point of the question. Moving from a linux based OS to FreeBSD is where the problem lay. I spent a lot time cleaning up problems with WPS. In WPS there were a lot of places where strings of characters were declared as character (len=xxx) in one subroutine and declared as integers in others. Places where strings were 132 characters long in one subroutine, 19 characters in another and 12 characters in others. I am assuming the same kind of problem with WRF. The question is where to look since line 189 refers to WRFU which I am having trouble finding
 
Would you please recompile WRF with the debug option, i.e.,
./clean -a
./configure -D
./compoile em_real

Then rerun the case. With debug option, the error message will specify which pin ein which code the error occurs first.
 
I had already compiled with debug on (configure -D), but I did as you suggested and reran wrf.exe. The results are as before


Timing for Writing wrfout_d01_2021-12-01_18:00:00 for domain 1: 0.99076 elapsed seconds
d01 2021-12-01_18:00:00 mminlu = 'MODIFIED_IGBP_MODIS_NOAH'
d01 2021-12-01_18:00:00 Warning LEN CHAR STRING > LEN DATA in ext_ncd_get_var_td.code CHAR, line 189
-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE: module_date_time.G LINE: 911
WRFU_TimeSet() in wrf_atotime() FAILED Routine returned error code = -1
-------------------------------------------
-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE: module_date_time.G LINE: 911
WRFU_TimeSet() in wrf_atotime() FAILED Routine returned error code = -1
-------------------------------------------
Note: The following floating-point exceptions are signalling: IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
 
Please take a look at the code in WRF/external/io_netcdf/wrf_io.F90, and subroutines it calls.
This could be a library issue or a machine issue.
 
I did some single stepping with gdb to trace things further. The 0th time step works correctly, however, the next time step it tries to process a date time string that is declared as a 19 character long string in wrf_io.f, but else where the time is variously declared as:

CHARACTER(LEN=ESMF_MAXSTR) where ESMF_MAXSTR=128
or
CHARACTER*(*)

The biggest problem seems to be with the CHARACTER*(*) because the first index in the array of strings is the correct date, but the following indicies which shouldn't exist are referenced and come with a null terminated empty string.

Although I have experience writing device drivers, I don't have any background in compilers so I do not understand how the gcc compilers under linux handle the problem of array length and type mismatch versus the gcc compiler under FreeBSD do.

I am thinking about recompiling the NETCDF libraries using llvm (clang and flang) and then compile wrf with flang and clang to see if I can better diagnostics
 
Top