Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Model crashing at radiation time step

neel14

Member
WRF-4.1.3
Model is crashing at radiation time step with BouLac PBL scheme, but runs successfully with other PBL schemes. Tried changing radt to other values but seeing similar crashing with similar error. Any help would be appreciated.

Code:
 &time_control
 run_days                            = 0,
 run_hours                           = 00,
 run_minutes                         = 0,
 run_seconds                         = 0,
 start_year                          = 2018, 2018,2018,
 start_month                         = 03,   03,   03,
 start_day                           = 31,   31,   31,
 start_hour                          = 00,   00,   00,
 end_year                            = 2018, 2018,2018,
 end_month                           = 05,   05,   05,
 end_day                             = 01,   01,   01,
 end_hour                            = 03,   03,   03,
 interval_seconds                    = 10800,
 input_from_file                     = .true.,.true.,.true.,
 history_interval                    = 1440,  1440, 60,
 frames_per_outfile                  = 100000, 100000, 100000,
 restart                             = .false.,
 restart_interval                    = 1440,
 rst_outname                         = '/scratch/neeldip/OUTPUT/wrfrst_d<domain>_<date>',
 io_form_history                     = 2
 io_form_restart                     = 2
 io_form_input                       = 2
 io_form_boundary                    = 2
 history_outname                     = '/scratch/neeldip/OUTPUT/wrfout_d<domain>_<date>',
 debug_level                         = 0,
 output_diagnostics                  = 1,
 auxhist3_outname                    = '/scratch/neeldip/OUTPUT/wrfxtrm_d<domain>_<date>', 
 auxhist3_interval                   = 180, 180,60,     
 io_form_auxhist3                    = 2,
 frames_per_auxhist3                 = 100000, 100000, 100000,
 frames_per_auxhist23                = 100000, 100000, 100000,
 io_form_auxinput4                   = 2,
 auxinput4_interval                  = 180,
 auxinput4_inname                    = "wrflowinp_d<domain>"
 io_form_auxhist1                    = 2
 auxhist1_interval                   = 1440,  1440,   60,
 frames_per_auxhist1                 = 100000, 100000, 100000,
 auxhist1_outname                    = '/scratch/neeldip/OUTPUT/wrf_trad_fields_d<domain>_<date>',
 iofields_filename                   = "d01.txt","d01.txt","d03.txt",
 ignore_iofields_warning             = .true.,
 /

 &domains
 time_step                           = 120,
 time_step_fract_num                 = 0,
 time_step_fract_den                 = 1,
 max_dom                             = 3,
 e_we                                = 231,247,349,
 e_sn                                = 197,211,313,
 e_vert                              = 35,    35,    35,
 p_top_requested                     = 5000,
 num_metgrid_levels                  = 138,
 num_metgrid_soil_levels             = 4,
 dx                                  = 27000, 9000,3000, 
 dy                                  = 27000, 9000,3000,
 grid_id                             = 1,2,3,   
 parent_id                           = 1,1,2,   
 i_parent_start                      = 1,75,70,
 j_parent_start                      = 1,64,55,    
 parent_grid_ratio                   = 1,3,3,    
 parent_time_step_ratio              = 1,3,4,    
 feedback                            = 1,     
 max_ts_locs                         = 50, 
 ts_buf_size                         = 200,
 max_ts_level                        = 50,
 tslist_unstagger_winds              = .false.,
 use_adaptive_time_step              = .false.,  
 smooth_option                       = 2,
 smooth_cg_topo                      = .true.,
 /

 &physics
 mp_physics                          = 10,   10, 10,   
 progn                               = 1,     1,  1,  
 ra_lw_physics                       = 4,     4,  4,  
 ra_sw_physics                       = 4,     4,  4, 
 radt                                = 27,   27, 27,    
 sf_sfclay_physics                   = 1,     1,  1,  
 sf_surface_physics                  = 2,     2,  2,  
 bl_pbl_physics                      = 9,     9,  9,  
 bldt                                = 0,     0,  0, 
 cu_physics                          = 3,     3,  0,
 shcu_physics                        = 1,     1,  0,  
 cudt                                = 0,     0,  0,  
 cugd_avedx                          = 3,
 ishallow                            = 1,    
 cu_diag                             = 1,     
 shcu_aerosols_opt                   = 2,     2,  2,    
 isfflx                              = 1,
 ifsnow                              = 1,
 icloud                              = 1,
 surface_input_source                = 3,
 sf_urban_physics                    = 0,     0, 0,  
 maxiens                             = 1,
 maxens                              = 3,
 maxens2                             = 3,
 maxens3                             = 16,
 ensdim                              = 144,
 cu_rad_feedback                     = .true.,.true.,.false.,
 sst_update                          = 1,
 num_land_cat                        = 21,
 usemonalb                           = .true.,
 rdmaxalb                            = .true.,
 rdlai2d                             = .true.,
 num_land_cat                        = 17,
 /

 &fdda
 grid_fdda                           = 2, 2, 2,
 gfdda_inname                        = "wrffdda_d<domain>"
 gfdda_interval_m                    = 180,180,180,
 gfdda_end_h                         = 10000, 10000, 10000,
 io_form_gfdda                       = 2,
 fgdt                                = 0, 0, 0,
 fgdtzero                            = 0, 0, 0,
 if_no_pbl_nudging_uv                = 1, 1, 1,
 if_no_pbl_nudging_t                 = 1, 1, 1,
 if_no_pbl_nudging_ph                = 1, 1, 1,
 if_no_pbl_nudging_q                 = 1, 1, 1,
 guv                                 = 0.0003, 0.0003, 0.0003,
 gt                                  = 0.0003, 0.0003, 0.0003, 
 gq                                  = 0.00001, 0.00001, 0.00001,
 gph                                 = 0.0003, 0.0003, 0.0003, 
 ktrop                               = 0,
 xwavenum                            = 2, 1, 1,
 ywavenum                            = 2, 1, 1,
 /

 &dynamics
 hybrid_opt                          = 2,
 w_damping                           = 1,
 diff_opt                            = 2,      2,  2,    
 km_opt                              = 4,      4,  4,    
 diff_6th_opt                        = 2,      2,  2,   
 diff_6th_factor                     = 0.12,   0.12,  0.12, 
 base_temp                           = 290,
 damp_opt                            = 3,
 zdamp                               = 5000.,  5000., 5000.,
 dampcoef                            = 0.2,    0.2,   0.2, 
 khdif                               = 2700,    900,  300,   
 kvdif                               = 100,      100,   100,   
 non_hydrostatic                     = .true., .true., .true.,
 moist_adv_opt                       = 1,      1,   1,    
 scalar_adv_opt                      = 1,      1,   1,       
 chem_adv_opt                        = 1,      1,   1,
 gwd_opt                             = 1,
 etac                                = 0.1,
 epssm                               = 0.5,0.5,0.5,  
 /

 &bdy_control
 spec_bdy_width                      = 9,
 spec_zone                           = 1,
 relax_zone                          = 8, 
 specified                           = .true.
 /

 &grib2
 /

 &namelist_quilt
 nio_tasks_per_group = 0,
 nio_groups = 1,
 /

 &diags
 diag_nwp2 = 1
 /
 
Hi,
Can you package your wrf output error files (e.g., rsl.error.*) together into a single *.TAR file (not a *.rar file - we cannot open that format) and attach it so I can take a look? Thanks!
 
Hi, I have attached the files.
The model seems to be crashing at d03 timestep.
 

Attachments

  • rsl.tar.gz
    49.9 KB · Views: 15
Thanks for sending those. Essentially the model is crashing immediately before any real integration happens. The inner-most domain is the first domain that integrates forward, which is why you are seeing this on d03. Take a look at this FAQ that discusses the most common reason for segmentation faults (which is the error that shows up in some of the rsl.error.* files). I don't see cfl errors, so you can ignore that section, but pay attention to the part that discusses the model stopping immediately.

I also notice that you are running many different options (several different diagnostics outputs, sst_update, etc). If you determine you do not have any problems with your input data, I would suggest trying to run this with the default namelist.input file that comes with the model code (I'll attach it in case you no longer have the original), and just modifying the dates, times, and domain size/position specs (and don't modify anything else) to see if that runs any further. If so, then you can slowly try to add in some of the other options you want to use to see if you can figure out which one is causing the issue.
 

Attachments

  • namelist.input.413.orig.txt
    3.9 KB · Views: 12
Hi,
Is there any special static input data requirement with BouLac scheme? I have no problems running other schemes with the same data.
 
No, there shouldn't be any specific requirement for the BouLac scheme, and at this time, we aren't aware of any issues with it, so perhaps it's not an issue with the input data. In that case, I'd recommend first trying to run this with the latest version of WRF (v4.3.3). If that doesn't work, then try to run with the default namelist, but only modifying the date, time, and domain configuration, but not modifying any physics, or adding any additional options to see if it runs. If it completes without errors, then we know the BouLac scheme is capable of running, and that perhaps one of the other options, combined with BouLac is causing the issue. You can follow the same method I mentioned below - to add one new namelist option at a time to see if they run. Since your simulation is stopping immediately, these should be very quick tests. Please let me know what you discover.
 
Hi,
Sorry for responding so late. There were issues with the HPC.
Haven't been able to test much but tried running with v4.3.3 and it failed as well. Rsl attached. So I guess there's some issue with the input data but I have been able to run 8 other schemes with the same data.

Will try further if possible.
 

Attachments

  • rsl.zip
    360.7 KB · Views: 0
Thanks for following up. I'd recommend trying the basic namelist test I mention in my April 11 post, keeping everything very basic, only modifying the dates, domain dimensions, and turning on the boulac scheme to see if that runs. I don't think it's a data issue if you were able to run other schemes with your data (even in previous WRF versions).
 
Top