Bounced here from the github issues section. Describe the bug Whenever I try to restart a simulation using any of the available restart points the job apparently starts and keeps going until canceled but no calculation is ever performed, nor any output file is produced. If I type the following: grep "Timing for main" rsl.out.0000 | grep " 1:" the output is always empty. In a properly running simulation usually the output has this form: Timing for main: time 2022-03-27_21:41:24 on domain 1: 12.74597 elapsed seconds Timing for main: time 2022-03-27_21:41:42 on domain 1: 12.73683 elapsed seconds Timing for main: time 2022-03-27_21:42:00 on domain 1: 12.77265 elapsed seconds Timing for main: time 2022-03-27_21:42:18 on domain 1: 12.74004 elapsed seconds Timing for main: time 2022-03-27_21:42:36 on domain 1: 12.77451 elapsed seconds Timing for main: time 2022-03-27_21:42:54 on domain 1: 12.79238 elapsed seconds When checking the rsl.out.000* and rsl.error.000* the only clue I get is the following: module_io.F: in wrf_read_field Warning BAD MEMORY ORDER |ZZ| for |ISEEDARR_MULT3D| in ext_ncd_read_field wrf_io.F90 input_wrf.F reading 2d integer iseedarr_mult3d Status = -19 input_wrf.F reading 0d logical is_cammgmp_used I get this at a rather initial stage in rsl.out.0000, however it shows the model keeps going reading the wrf_restart files after that. To Reproduce Steps to reproduce the behavior:
restart = .true., restart_interval = 4320, io_form_history = 2 io_form_restart = 2 io_form_input = 2 io_form_boundary = 2 debug_level = 10000000, &domains time_step = 18, time_step_fract_num = 0, time_step_fract_den = 1, max_dom = 4, e_we = 151, 127, 85, 79 e_sn = 92, 118, 52, 76 e_vert = 85, 85, 85, 85, p_top_requested = 10000, num_metgrid_levels = 34, num_metgrid_soil_levels = 4, dx = 3000.0, 1000.0, 333.0, 333.0, dy = 3000.0, 1000.0, 333.0, 333.0, grid_id = 1, 2, 3, 4, parent_id = 1, 1, 2, 2, i_parent_start = 1, 16, 54, 20 j_parent_start = 1, 17, 60, 21 parent_grid_ratio = 1, 3, 3, 3, parent_time_step_ratio = 1, 3, 3, 3, feedback = 0, smooth_option = 0 smooth_cg_topo = .true. ! physics_suite = 'CONUS' mp_physics = 16, 16, 16, 16, cu_physics = 1, 0, 0, 0, ra_lw_physics = 4, 4, 4, 4, ra_sw_physics = 4, 4, 4, 4, bl_pbl_physics = 2, 2, 2, 2, sf_sfclay_physics = 2, 2, 2, 2, sf_surface_physics = 4, 4, 4, 4, radt = 9, 9, 9, 9, bldt = 0, 0, 0, 0, cudt = 5, 0, 0, 0, icloud = 0, num_land_cat = 61, sf_urban_physics = 3, 3, 3, 3, use_wudapt_lcz = 1,
Expected behavior I would expect the simulation to resume from the restart point, as usual Attachments sbatch script used to run the job pasted below (can't see a proper attachment button anywhere): #!/bin/bash #SBATCH --qos bigmem #SBATCH --mem=50000 #SBATCH -c 5 #SBATCH --nodes 1 ##SBATCH --exclusive #SBATCH --ntasks 5 #SBATCH --cpus-per-task 1 #SBATCH -t 15-00:00:00 #SBATCH -J Leman_August_50m module load gcc openmpi BASE=$HOME/software/wrf PNETCDF_ROOT=$BASE/pnetcdf-install NETCDF_ROOT=$BASE/netcdf-install SZIP_ROOT=$BASE/szip-install HDF5_ROOT=$BASE/hdf5-install #WRF_ROOT=$BASE/WRFV4.5 #export PATH=$WRF_ROOT/main:$NETCDF_ROOT/bin:$PNETCDF_ROOT/bin:$HDF5_ROOT/bin:$SZ_ROOT/bin:$PATH WRF_ROOT=/ssoft/spack/syrah/v1/opt/spack/linux-rhel8-icelake/gcc-11.3.0/wrf-4.5-3k4uylttabrety2iu2au24lc2thhycx6/main export PATH=$WRF_ROOT/main:$PATH export LD_LIBRARY_PATH=$NETCDF_ROOT/lib:$PNETCDF_ROOT/lib:$HDF5_ROOT/lib:$SZ_ROOT/lib64:$SZIP_ROOT/lib:$LD_LIBRARY_PATH ulimit -s unlimited #run real.exe -j 6 srun wrf.exe -j 6 Additional context I started experiencing this issue as soon as I moved to my current institution on a new supercomputer with a slightly different architecture than the previous one I used to work on. Never had this issue before. The resident IT staff is unresponsive on this issue/behavior. |