Hi everyone. I am running WRF in an HPC. WRF would run for a day and then immediately gives an error.
When I use this slurm script,
WRF gives an error to reduce the processor
I reduced the number of processor to 16, but what I get is
What do you think seems to be the problem? Here's my namelist.input by the way,
When I use this slurm script,
Code:
#!/bin/bash
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=32
#SBATCH --mem=32000
#SBATCH --time=168:00:00
#SBATCH --partition=batch
#set stack size to unlimited ulimit -s unlimited
ulimit -s unlimited
# Place commands to load environment modules here
module load wrf/3.9.1-intel-mpich
# MAIN
mpirun -n 64 real.exe && mpirun -n 64 wrf.exe
WRF gives an error to reduce the processor
Code:
------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE: <stdin> LINE: 645
Submit the real program again with fewer processors
-------------------------------------------
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
I reduced the number of processor to 16, but what I get is
Code:
Assertion failed in file src/mpid/ch3/channels/nemesis/netmod/tcp/socksm.c at line 596: hdr.pkt_type == MPIDI_NEM_TCP_SOCKSM_PKT_ID_INFO || hdr.pkt_type == MPIDI_NEM_TCP_SOCKSM_PKT_TMPVC_INFO.
What do you think seems to be the problem? Here's my namelist.input by the way,
Code:
&time_control
run_days = 31,
run_hours = 0,
run_minutes = 0,
run_seconds = 0,
start_year = 2016, 2016, 2016, 2016,
start_month = 12, 12, 12, 12,
start_day = 1, 1, 1, 1,
start_hour = 00, 00, 00, 00,
start_minute = 00, 00, 00, 00,
start_second = 00, 00, 00, 00,
end_year = 2016, 2016, 2016, 2016,
end_month = 12, 12, 12, 12,
end_day = 31, 31, 31, 31,
end_hour = 17, 17, 17, 17,
end_minute = 00, 00, 00, 00,
end_second = 00, 00, 00, 00,
interval_seconds = 21600,
input_from_file = .true., .true., .true.,
history_interval = 180, 60, 60, 60,
frames_per_outfile = 1, 1, 1, 1,
restart = .false.,
restart_interval = 5000,
io_form_history = 2,
io_form_restart = 2,
io_form_input = 2,
io_form_boundary = 2,
debug_level = 0,
auxhist2_outname = "winds_d<domain>_<date>",
auxhist3_interval = 0,0,15,15,
io_form_auxhist2 = 2,
frames_per_auxhist2 = 1
/
&domains
eta_levels = 1.000, 0.9947, 0.9895, 0.9843, 0.979,
0.9739, 0.9684, 0.9626, 0.9564, 0.9498,
0.9426, 0.9348, 0.9262, 0.9167, 0.9062,
0.8946, 0.8816, 0.8671, 0.8509, 0.833,
0.813, 0.7909, 0.7667, 0.7402, 0.7116,
0.6809, 0.6483, 0.6141, 0.5785, 0.5419,
0.5047, 0.4672, 0.4299, 0.3931, 0.357,
0.322, 0.2883, 0.256, 0.2253, 0.1963,
0.169, 0.1435, 0.1171, 0.0952, 0.0753,
0.0571, 0.0407, 0.0257, 0.0122, 0.000,
time_step = 125,
time_step_fract_num = 0,
time_step_fract_den = 1,
max_dom = 4,
e_we = 26, 76, 276, 456,
e_sn = 37, 126, 231, 671,
e_vert = 50, 50, 50, 50,
p_top_requested = 5000,
num_metgrid_levels = 32,
num_metgrid_soil_levels = 4,
dx = 25000, 5000, 1000, 200,
dy = 25000, 5000, 1000, 200,
grid_id = 1, 2, 3, 4,
parent_id = 1, 1, 2, 3,
i_parent_start = 1, 6, 12, 121,
j_parent_start = 1, 6, 17, 52,
parent_grid_ratio = 1, 5, 5, 5,
parent_time_step_ratio = 1, 5, 5, 5,
feedback = 1,
smooth_option = 0,
/
&physics
mp_physics = 6, 6, 6, 6,
ra_lw_physics = 1, 1, 1, 1,
ra_sw_physics = 1, 1, 1, 1,
radt = 30, 30, 30, 30,
sf_sfclay_physics = 1, 1, 1, 1,
sf_surface_physics = 2, 2, 2, 2,
bl_pbl_physics = 1, 1, 1, 1,
bldt = 0, 0, 0, 0,
cu_physics = 1, 1, 0, 0,
cudt = 5, 5, 5, 2,
isfflx = 1,
ifsnow = 0,
icloud = 1,
surface_input_source = 1,
num_soil_layers = 4,
sf_urban_physics = 0, 0, 0, 0,
maxiens = 1,
maxens = 3,
maxens2 = 3,
maxens3 = 16,
ensdim = 144,
/
&fdda
/
&dynamics
w_damping = 1,
diff_opt = 1,
km_opt = 4,
diff_6th_opt = 0, 0, 0, 0,
diff_6th_factor = 0.12, 0.12, 0.12, 0.12,
base_temp = 290.,
damp_opt = 0,
zdamp = 5000., 5000., 5000., 5000,
dampcoef = 0.2, 0.2, 0.2, 0.2,
khdif = 0, 0, 0, 0,
kvdif = 0, 0, 0, 0,
non_hydrostatic = .true., .true., .true., .true.,
moist_adv_opt = 1, 1, 1, 1,
scalar_adv_opt = 1, 1, 1, 1,
/
&bdy_control
spec_bdy_width = 5,
spec_zone = 1,
relax_zone = 4,
specified = .true., .false., .false., .false.,
nested = .false., .true., .true., .true.,
/
&grib2
/
&namelist_quilt
nio_tasks_per_group = 0,
nio_groups = 1,
/