Xinxu Zhao
New member
Hi All,
My simulation is based on WRF v3.9.1.1. Since I wanna run another long-period test (~2 years), I would like to use more processors (i.e., number of nodes) to speed up my running. The key settings in the namelist.input related to time step and domain size are:
I always used 12 cores (28 nodes per core) for my previous running which ran smoothly and worked well. The beginning of rsl.error.0000 is:
Then I try to run the case using over 12 cores. When using 24 nodes, the wrf.exe seems to just hang/stop (the last line below) there without errors and progress until the running is out of time. It seems that the running hanged when nesting to d02.
In the rsl.error.0000:
I also tried other numbers of nodes (e.g., 14, 16, 20) which also do not work. Do you have any idea why this hang happened? Any idea how to debug it?
Additionally, based on the info provided in this link: https://forum.mmm.ucar.edu/phpBB3/viewtopic.php?t=5082, I am also trying to figure out an appropriate number of processors I should use for my running.
But it seems too fewer processors are recommended in my domain, i.e.,
e_we=126
e_sn=126
The most amount of processors are around 25.
e_we=196
e_sn=196
The least amount of processors are around 4.These limits do not make sense for my running. I would like to know whether something wrong with my understanding.
Thx a lot!
Xinxu
My simulation is based on WRF v3.9.1.1. Since I wanna run another long-period test (~2 years), I would like to use more processors (i.e., number of nodes) to speed up my running. The key settings in the namelist.input related to time step and domain size are:
Code:
&time_control
run_days = 0,
run_hours = 24,
run_minutes = 0,
run_seconds = 0,
start_year = 2018, 2018, 2018,
start_month = 08, 08, 08,
start_day = 01, 01, 01,
start_hour = 00, 00, 00,
start_minute = 00, 00, 00,
start_second = 00, 00, 00,
end_year = 2018, 2018, 2018,
end_month = 08, 08, 08,
end_day = 02, 02, 02,
end_hour = 00, 00, 00,
end_minute = 00, 00, 00,
end_second = 00, 00, 00,
interval_seconds = 21600,
input_from_file = .true.,.true.,.true.,
history_interval = 180, 60, 15,
frames_per_outfile = 1000, 1000, 1000,
write_hist_at_0h_rst = .true.,
restart = .false.,
restart_interval = 360,
override_restart_timers = .true.,
rst_outname = 'wrfrst_d<domain>_<date>',
history_outname = 'wrfout_d<domain>_<date>',
auxinput1_inname = 'met_em.d<domain>.<date>',
io_form_history = 2,
io_form_restart = 2,
io_form_input = 2,
io_form_boundary = 2,
io_form_restart = 2,
debug_level = 1,
auxinput15_inname = 'vprm_input_d<domain>_<date>',
io_form_auxinput15 = 2,
auxinput15_interval_m = 1800, 1800, 1800,
frames_per_auxinput15 = 1, 1, 1,
auxinput5_inname = 'wrfchemi_d<domain>_<date>',
io_form_auxinput5 = 2,
auxinput5_interval_m = 60, 60, 60,
frames_per_auxinput5 = 1, 1, 1,
/
&domains
time_step = 30,
time_step_fract_num = 0,
time_step_fract_den = 1,
max_dom = 3,
e_we = 150, 126, 196,
e_sn = 150, 126, 196,
e_vert = 46, 46, 46,
p_top_requested = 5000,
num_metgrid_levels = 138,
num_metgrid_soil_levels = 4,
dx = 10000, 2000, 400,
dy = 10000, 2000, 400,
grid_id = 1, 2, 3,
parent_id = 1, 1, 2,
i_parent_start = 1, 63, 44,
j_parent_start = 1, 63, 44,
parent_grid_ratio = 1, 5, 5,
parent_time_step_ratio = 1, 5, 5,
feedback = 0,
smooth_option = 0
eta_levels = 1.000,0.998,0.996,0.994,0.992,0.990,0.988,0.984,0.980,0.976,0.970,0.964,
0.958,0.952,0.945,0.938,0.930,0.922,0.914,0.904,0.894,0.884,0.874,0.860,0.846,
0.832,0.818,0.800,0.775,0.750,0.720,0.700,0.650,0.600,0.550,0.500,
0.450,0.400,0.350,0.300,0.250,0.200,0.150,0.100,0.050,0.000,
/
&physics
mp_physics = 3, 3, 3,
ra_lw_physics = 4, 4, 4,
ra_sw_physics = 4, 4, 4,
radt = 30, 30, 30,
sf_sfclay_physics = 2, 2, 2,
sf_surface_physics = 2, 2, 2,
bl_pbl_physics = 2, 2, 2,
bldt = 0, 0, 0,
cu_physics = 3, 0, 0,
cudt = 5, 5, 5,
cu_diag = 1, 0, 0,
isfflx = 1,
ifsnow = 0,
icloud = 1,
surface_input_source = 1,
num_soil_layers = 4,
num_land_cat = 40,
sf_urban_physics = 2, 2, 2,
maxiens = 1,
maxens = 3,
maxens2 = 3,
maxens3 = 16,
ensdim = 144,
/
&chem
chem_opt = 17,17,17,
emiss_opt = 17,17,17,
vprm_opt = 'VPRM_table_EUROPE','VPRM_table_EUROPE','VPRM_table_EUROPE',
phot_opt = 0,0,0,
chem_in_opt = 0,0,0,
io_style_emissions = 2,
kemit = 7,
chemdt = 1.,1.,1.,
bioemdt = 30,30,30,
photdt = 30,30,30,
gas_drydep_opt = 0,0,0,
aer_drydep_opt = 0,0,0,
bio_emiss_opt = 17,17,17,
emiss_inpt_opt = 16,16,16,
biomass_burn_opt = 0,0,0,
plumerisefire_frq = 0,0,0,
gas_bc_opt = 1,1,1,
gas_ic_opt = 1,1,1,
aer_bc_opt = 1,1,1,
aer_ic_opt = 1,1,1,
gaschem_onoff = 0,0,0,
aerchem_onoff = 0,0,0,
vertmix_onoff = 1,1,1,
chem_conv_tr = 1,0,0,
have_bcs_chem = .true,.true,.true,
have_bcs_tracer = .true,.true,.true,
aer_ra_feedback = 0,0,0,
wetscav_onoff = 0,0,0,
cldchem_onoff = 0,0,0,
conv_tr_wetscav = 0,0,0,
/
&fdda
/
&dynamics
w_damping = 0,
diff_opt = 2, 2, 2,
km_opt = 4, 4, 4,
diff_6th_opt = 2, 2, 2,
diff_6th_factor = 0.12, 0.12, 0.12,
base_temp = 290.
damp_opt = 0,
zdamp = 5000., 5000., 5000.,
dampcoef = 0.2, 0.2, 0.2
khdif = 0, 0, 0,
kvdif = 0, 0, 0,
non_hydrostatic = .true., .true., .true.,
moist_adv_opt = 1, 1, 1,
scalar_adv_opt = 1, 1, 1,
/
&bdy_control
spec_bdy_width = 5,
spec_zone = 1,
relax_zone = 4,
specified = .true., .false.,.false.,
nested = .false., .true., .true.,
/
&grib2
/
&namelist_quilt
nio_tasks_per_group = 0,
nio_groups = 1,
/
Code:
taskid: 0 hostname: i22r01c06s10
module_io_quilt_old.F 2931 F
Quilting with 1 groups of 0 I/O tasks.
Ntasks in X 16 , ntasks in Y 21
In the rsl.error.0000:
Code:
taskid: 0 hostname: i22r01c01s07
module_io_quilt_old.F 2931 F
Quilting with 1 groups of 0 I/O tasks.
Ntasks in X 24 , ntasks in Y 28
--- WARNING: traj_opt is zero, but num_traj is not zero; setting num_traj to zero.
--- NOTE: sst_update is 0, setting io_form_auxinput4 = 0 and auxinput4_interval = 0 for all domains
--- NOTE: sst_update is 0, setting io_form_auxinput4 = 0 and auxinput4_interval = 0 for all domains
--- NOTE: sst_update is 0, setting io_form_auxinput4 = 0 and auxinput4_interval = 0 for all domains
--- NOTE: grid_fdda is 0 for domain 1, setting gfdda interval and ending time to 0 for that domain.
--- NOTE: both grid_sfdda and pxlsm_soil_nudge are 0 for domain 1, setting sgfdda interval and ending time to 0 for that domain.
--- NOTE: obs_nudge_opt is 0 for domain 1, setting obs nudging interval and ending time to 0 for that domain.
--- NOTE: grid_fdda is 0 for domain 2, setting gfdda interval and ending time to 0 for that domain.
--- NOTE: both grid_sfdda and pxlsm_soil_nudge are 0 for domain 2, setting sgfdda interval and ending time to 0 for that domain.
--- NOTE: obs_nudge_opt is 0 for domain 2, setting obs nudging interval and ending time to 0 for that domain.
--- NOTE: grid_fdda is 0 for domain 3, setting gfdda interval and ending time to 0 for that domain.
--- NOTE: both grid_sfdda and pxlsm_soil_nudge are 0 for domain 3, setting sgfdda interval and ending time to 0 for that domain.
--- NOTE: obs_nudge_opt is 0 for domain 3, setting obs nudging interval and ending time to 0 for that domain.
--- NOTE: bl_pbl_physics /= 4, implies mfshconv must be 0, resetting
Need MYNN PBL for icloud_bl = 1, resetting to 0
*************************************
No physics suite selected.
Physics options will be used directly from the namelist.
*************************************
--- NOTE: RRTMG radiation is in use, setting: levsiz=59, alevsiz=12, no_src_types=6
--- NOTE: num_soil_layers has been set to 4
WRF V3.9.1.1 MODEL
*************************************
Parent domain
ids,ide,jds,jde 1 150 1 150
ims,ime,jms,jme -4 14 -4 13
ips,ipe,jps,jpe 1 7 1 6
*************************************
DYNAMICS OPTION: Eulerian Mass Coordinate
alloc_space_field: domain 1 , 58299468 bytes allocated
wrf main: calling open_r_dataset for wrfinput
med_initialdata_input: calling input_input
mminlu = 'MODIFIED_IGBP_MODIS_NOAH'
Timing for processing wrfinput file (stream 0) for domain 1: 7.42752 elapsed seconds
Max map factor in domain 1 = 0.98. Scale the dt in the model accordingly.
WRF TILE 1 IS 1 IE 7 JS 1 JE 6
set_tiles3: NUMBER OF TILES = 1
INPUT LandUse = "MODIFIED_IGBP_MODIS_NOAH"
LANDUSE TYPE = "MODIFIED_IGBP_MODIS_NOAH" FOUND 40 CATEGORIES 2 SEASONS WATER CATEGORY = 17 SNOW CATEGORY = 15
Do not have ozone. Must read it in.
Master rank reads ozone.
Broadcast ozone to other ranks.
INITIALIZE THREE Noah LSM RELATED TABLES
Skipping over LUTYPE = USGS
LANDUSE TYPE = MODIFIED_IGBP_MODIS_NOAH FOUND 20 CATEGORIES
INPUT SOIL TEXTURE CLASSIFICATION = STAS
SOIL TEXTURE CLASSIFICATION = STAS FOUND 19 CATEGORIES
*********************************************************************
* PROGRAM:WRF-Chem V3.9.1.1 MODEL
* *
* PLEASE REPORT ANY BUGS TO WRF-Chem HELP at *
* *
* wrfchemhelp.gsd@noaa.gov *
* *
*********************************************************************
WARNING: Users interested in the GHG options should check the comments/references in header of module_ghg_fluxes
Warning: the VPRM parameters may need to be optimized depending on the season, year and region!
The parameters provided here should be used for testing purposes only!
*************************************
Nesting domain
ids,ide,jds,jde 1 126 1 126
ims,ime,jms,jme -4 20 -4 15
ips,ipe,jps,jpe 1 6 1 5
INTERMEDIATE domain
ids,ide,jds,jde 61 91 61 91
ims,ime,jms,jme 56 73 56 73
ips,ipe,jps,jpe 59 63 59 63
*************************************
d01 2018-08-01_18:00:00 alloc_space_field: domain 2 , 7659360 bytes allocated
d01 2018-08-01_18:00:00 alloc_space_field: domain 2 , 82253220 bytes allocated
Additionally, based on the info provided in this link: https://forum.mmm.ucar.edu/phpBB3/viewtopic.php?t=5082, I am also trying to figure out an appropriate number of processors I should use for my running.
But it seems too fewer processors are recommended in my domain, i.e.,
For your smallest-sized domain:For your smallest-sized domain:
((e_we)/25) * ((e_sn)/25) = most amount of processors you should use
e_we=126
e_sn=126
The most amount of processors are around 25.
For your largest-sized domain:For your largest-sized domain:
((e_we)/100) * ((e_sn)/100) = least amount of processors you should use
e_we=196
e_sn=196
The least amount of processors are around 4.These limits do not make sense for my running. I would like to know whether something wrong with my understanding.
Thx a lot!
Xinxu