Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

WRF Restart Won't Run

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

sym04110

New member
Hello,

I am trying to restart a WRF run on Cheyenne. I'm currently running WRF4.1.4 with polar optimization. The run I'm trying to restart ran successfully until I hit the walltime limit. I've changed both the 'restart' field and the start time in my namelist.input. When I try to run wrf.exe after this to restart the run, I get the following in my rsl.error files:

Code:
taskid: 19 hostname: r1i3n16
 module_io_quilt_old.F        2931 F
Quilting with   1 groups of   0 I/O tasks.
 Ntasks in X            4 , ntasks in Y            5
--- WARNING: Goddard radiation and Goddard 4ice microphysics are not used together
--- WARNING: These options may be best to use together.
--- WARNING: Goddard radiation and Goddard 4ice microphysics are not used together
--- WARNING: These options may be best to use together.
--- WARNING: Goddard radiation and Goddard 4ice microphysics are not used together
--- WARNING: These options may be best to use together.
WRF V4.1.4 MODEL
 *************************************
 Parent domain
 ids,ide,jds,jde            1         100           1         100
 ims,ime,jms,jme           69         105          74         105
 ips,ipe,jps,jpe           76         100          81         100
 *************************************
DYNAMICS OPTION: Eulerian Mass Coordinate
   alloc_space_field: domain            1 ,               36708628  bytes allocated
forrtl: severe (66): output statement overflows record, unit -5, file Internal List-Directed Write
Image              PC                Routine            Line        Source             
wrf.exe            00000000033DBFFE  Unknown               Unknown  Unknown
wrf.exe            000000000342EF1D  Unknown               Unknown  Unknown
wrf.exe            00000000013D41D8  Unknown               Unknown  Unknown
wrf.exe            00000000004073DC  Unknown               Unknown  Unknown
wrf.exe            00000000004061B5  Unknown               Unknown  Unknown
wrf.exe            000000000040615E  Unknown               Unknown  Unknown
libc.so.6          00002ACC68E9D6E5  __libc_start_main     Unknown  Unknown
wrf.exe            0000000000406069  Unknown               Unknown  Unknown

This is my current namelist.input file:
Code:
&time_control
 run_days                               = 0,
 run_hours                              = 264,
 run_minutes                            = 0,
 run_seconds                            = 0,
 start_year                             = 2015, 2015, 2015,
 start_month                            = 02,   02,   02,
 start_day                              = 05,   05,   05,
 start_hour                             = 00,   00,   00,
 end_year                               = 2015, 2015, 2015,
 end_month                              = 02,   02,   02,
 end_day                                = 20,   20,   20,
 end_hour                               = 00,   00,   00,
 interval_seconds                       = 21600
 input_from_file                        = .true.,.true.,.true.,
 history_interval                       = 180,  60,   60,
 frames_per_outfile                     = 1000, 1000, 1000,
 restart                                = .true.,
 restart_interval                       = 2880,
 io_form_history                        = 2
 io_form_restart                        = 2
 io_form_input                          = 2
 io_form_boundary                       = 2
 history_outname                        = '/glade/work/sarahm/wrfout/wrfout_d<domain>_<date>',
 rst_inname                             = '/glade/work/sarahm/wrfout/wrfrst_d<domain>_<date>',
 rst_outname                            = '/glade/work/sarahm/wrfout/wrfrst_d<domain>_<date>',
/

&domains
 time_step                              = 120,
 time_step_fract_num                    = 0,
 time_step_fract_den                    = 1,
 max_dom                                = 3,
 e_we                                   = 100,    100,   160,
 e_sn                                   = 100,    100,    160,
 e_vert                                 = 33,    33,    33,
 p_top_requested                        = 5000
 num_metgrid_levels                     = 38,
 dx                                     = 27000, 9000,  3000.,
 dy                                     = 27000, 9000,  3000.,
 grid_id                                = 1,     2,     3,
 parent_id                              = 0,     1,     2,
 i_parent_start                         = 1,     35,    16,
 j_parent_start                         = 1,     35,    40,
 parent_grid_ratio                      = 1,     3,     3,
 parent_time_step_ratio                 = 1,     3,     3,
 feedback                               = 1,
 smooth_option                          = 0,
/

&physics
 mp_physics                             = 7, 7, 7,
 bl_pbl_physics                         = 1, 1, 1,
 gsfcgce_hail                           = 0,
 gsfcgce_2ice                           = 0,
 co2tf                                  = 1,
 cu_physics                             = 3, 3, 3,
 cudt                                   = 5, 5, 5,
 icloud                                 = 1,
 ra_lw_physics                          = 4, 4, 4,
 ra_sw_physics                          = 4, 4, 4,
 radt                                   = 20,
 slope_rad                              = 0,
 topo_shading                           = 0,
 sf_sfclay_physics                      = 91, 91, 91,
 sf_surface_physics                     = 2, 2, 2,
 bldt                                   = 0,
 isfflx                                 = 1,
 ifsnow                                 = 0,
 surface_input_source                   = 1,
 num_land_cat                           = 24,
/

&fdda
/

&dynamics
 hybrid_opt                             = 2,
 w_damping                              = 0,
 diff_opt                               = 1,      1,      1,
 km_opt                                 = 4,      4,      4,
 diff_6th_opt                           = 0,      0,      0,
 diff_6th_factor                        = 0.12,   0.12,   0.12,
 base_temp                              = 290.
 damp_opt                               = 3,
 zdamp                                  = 5000.,  5000.,  5000.,
 dampcoef                               = 0.2,    0.2,    0.2
 khdif                                  = 0,      0,      0,
 kvdif                                  = 0,      0,      0,
 non_hydrostatic                        = .true., .true., .true.,
 moist_adv_opt                          = 1,      1,      1,
 scalar_adv_opt                         = 1,      1,      1,
 gwd_opt                                = 1,
 /

&bdy_control
 spec_bdy_width                         = 5,
 specified                              = .true.
/

 &grib2
 /

&namelist_quilt
 nio_tasks_per_group                    = 0,
 nio_groups                             = 1,
/

Do anyone have any ideas why I could be running into issues restarting my WRF run?

Thank you!
Sarah
 
Hi Sarah,
Can you let me know the path to the location where you are trying to run this on Cheyenne? Thanks!
 
Hi,
Thanks for sending that. I think the problem may be related to disk space in the directory where you are trying to write the files. You are sending the files to your /glade/work/ directory, which has a lot less space than /glade/scratch. When I use your exact namelist, your wrfrst* files, and wrfbdy_d01 file, it's running okay for me on Cheyenne. Test out removing the line
Code:
history_outname                        = '/glade/work/sarahm/wrfout/wrfout_d<domain>_<date>',
to see if that makes any difference.
 
Hi,

Thanks for the suggestion. I changed all output to be put in my /glade/scratch directory and moved any output I had in /glade/work to my scratch directory. I'm still getting this error. I was wondering if it had to do with the installation of the polar component, but I've tracked back and even without the polar files I'm still getting issues when I try to restart my run. Do you have any other suggestions for potential causes I could explore?

I'm also curious if the restart is necessary. I realize that using the restart eliminates the need for spinup, but how significant and for how long into the run are these differences? I'm trying to run a number of ~4-month runs in the Arctic and with my current domain setup (with two nests) I only complete about 6 days of my simulation per 12-hour walltime period. How would you suggest I proceed? Do you have any suggestions on how to make dealing with the 12-hour walltime limit more efficient when running WRF?

Thank you again for all your help,
Sarah
 
Hi Sarah,
I'm still not sure why you are getting this error when you are restarting. I did a test with your wrfrst_d0*_2015-02-05_00:00:00 files and your wrfbdy_d01 file, along with your namelist and V4.1.4 and I'm not having any problems running the restart. I don't know if it's because of our code differences, but it would perhaps be worth checking to see if you are able to run this with a non-modified version of the code. You can grab the pre-compiled version from ~wrfhelp/PRE_COMPILED_CODE/WRFV4.1.4_intel_dmpar

If you are interested in seeing my run, it can be found in /glade/scratch/kkeene/sarahm/wrfv414/test/em_real

Af for a way around the wallclock time, unfortunately restarts are going to have to be the way to go. Some of your options (specifically physics) may be causing your run to take longer, but if those are the ones you want to use, then that just may be how long it takes. I think you could probably increase the number of processors a bit. I'm running with 36 (1 node) and it seems to be okay.
 
Hi Sarah,
I'm still not sure why you are getting this error when you are restarting. I did a test with your wrfrst_d0*_2015-02-05_00:00:00 files and your wrfbdy_d01 file, along with your namelist and V4.1.4 and I'm not having any problems running the restart. I don't know if it's because of our code differences, but it would perhaps be worth checking to see if you are able to run this with a non-modified version of the code. You can grab the pre-compiled version from ~wrfhelp/PRE_COMPILED_CODE/WRFV4.1.4_intel_dmpar

If you are interested in seeing my run, it can be found in /glade/scratch/kkeene/sarahm/wrfv414/test/em_real

Af for a way around the wallclock time, unfortunately restarts are going to have to be the way to go. Some of your options (specifically physics) may be causing your run to take longer, but if those are the ones you want to use, then that just may be how long it takes. I think you could probably increase the number of processors a bit. I'm running with 36 (1 node) and it seems to be okay.
 
Thank you! I finally got the restart to work!

I copied the WRF installation in your scratch directory (/glade/scratch/kkeene/sarahm/wrfv414/) to my work directory. From there I was able to replicate what you did and successfully restart the run. I then replaced the necessary files with the polar modified ones and was still able to do a restart.

Thank you for all the time spent troubleshooting this with me!
 
Top