Model crashes on writing first restart file

mdtoy65

New member
Hello,
I'm running MPAS-A on the 240km uniform global mesh with the 'mesoscale_reference' physics suite. I'm using 128 MPI tasks, and these I/O settings in namelist.atmosphere:
&io
config_pio_num_iotasks = 20
config_pio_stride = 4

The model outputs diag.*nc and history*.nc files, but crashes when beginning to write the first restart.nc file.
The debug-mode messages are as follows:
atmosphere_model. 0000000003E504EA write_chunk_pnetc 2711 smiol.c
atmosphere_model. 0000000003E4D91A SMIOL_put_var 1261 smiol.c
atmosphere_model. 0000000003E1C517 smiolf_mp_smiolf_ 1573 smiolf_put_get_var.inc
atmosphere_model. 0000000003ACFA1E mpas_io_mp_mpas_i 3466 mpas_io.F
atmosphere_model. 0000000003ADF5AD mpas_io_mp_mpas_i 4175 mpas_io.F
atmosphere_model. 0000000003DE24DB mpas_io_streams_m 3669 mpas_io_streams.F
atmosphere_model. 0000000003B3015E mpas_stream_manag 3373 mpas_stream_manager.F
atmosphere_model. 0000000003B2C079 mpas_stream_manag 2837 mpas_stream_manager.F
atmosphere_model. 000000000094FD58 atm_core_mp_atm_c 835 mpas_atm_core.F
atmosphere_model. 000000000041CF8C mpas_subdriver_mp 417 mpas_subdriver.F
atmosphere_model. 0000000000416EDD MAIN__ 20 mpas.F

The 'smiol.c' line where the program ends is:
2704 /*
2705 * If making a buffered write would cause the remaining buffer
2706 * size to be exceeded on any task, wait for non-blocking
2707 * writes to complete
2708 */
2709 if ((size_t)max_usage > file->bufsize
2710 || file->n_reqs == MAX_REQS) {
2711 ierr = ncmpi_wait_all(file->ncidp, file->n_reqs,
2712 file->reqs, NULL); /* statuses */
2713 file->n_reqs = 0;
2714 }

I'm wondering if the issue is that the resulting restart.nc file would be too large. Do I need to allow for 'big' netCDF I/O capability on compiling?

Thank you.

-- Mike
 
Mike,

Would you please upload your namelist.atmopshere and streams.atmosphere for me to take a look? Which version of MPAS did you run? Thanks.
 
I think there may be an issue with SMIOL when the product of config_pio_num_iotasks and config_pio_stride doesn't equal the total number of MPI ranks. Could you try the run again, but using
&io
config_pio_num_iotasks = 32
config_pio_stride = 4
/
for example?
 
Thank you for the recommendation. I made the i/o namelist adjustments, but the problem still occurs. I think this may be an issue with our GSL fork of MPAS. I'm not able to reproduce the problem with the code from NCAR's mpas-Dev repository. I will keep troubleshooting.
 
Back
Top