Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Model crashes on writing first restart file

mdtoy65

New member
Hello,
I'm running MPAS-A on the 240km uniform global mesh with the 'mesoscale_reference' physics suite. I'm using 128 MPI tasks, and these I/O settings in namelist.atmosphere:
&io
config_pio_num_iotasks = 20
config_pio_stride = 4

The model outputs diag.*nc and history*.nc files, but crashes when beginning to write the first restart.nc file.
The debug-mode messages are as follows:
atmosphere_model. 0000000003E504EA write_chunk_pnetc 2711 smiol.c
atmosphere_model. 0000000003E4D91A SMIOL_put_var 1261 smiol.c
atmosphere_model. 0000000003E1C517 smiolf_mp_smiolf_ 1573 smiolf_put_get_var.inc
atmosphere_model. 0000000003ACFA1E mpas_io_mp_mpas_i 3466 mpas_io.F
atmosphere_model. 0000000003ADF5AD mpas_io_mp_mpas_i 4175 mpas_io.F
atmosphere_model. 0000000003DE24DB mpas_io_streams_m 3669 mpas_io_streams.F
atmosphere_model. 0000000003B3015E mpas_stream_manag 3373 mpas_stream_manager.F
atmosphere_model. 0000000003B2C079 mpas_stream_manag 2837 mpas_stream_manager.F
atmosphere_model. 000000000094FD58 atm_core_mp_atm_c 835 mpas_atm_core.F
atmosphere_model. 000000000041CF8C mpas_subdriver_mp 417 mpas_subdriver.F
atmosphere_model. 0000000000416EDD MAIN__ 20 mpas.F

The 'smiol.c' line where the program ends is:
2704 /*
2705 * If making a buffered write would cause the remaining buffer
2706 * size to be exceeded on any task, wait for non-blocking
2707 * writes to complete
2708 */
2709 if ((size_t)max_usage > file->bufsize
2710 || file->n_reqs == MAX_REQS) {
2711 ierr = ncmpi_wait_all(file->ncidp, file->n_reqs,
2712 file->reqs, NULL); /* statuses */
2713 file->n_reqs = 0;
2714 }

I'm wondering if the issue is that the resulting restart.nc file would be too large. Do I need to allow for 'big' netCDF I/O capability on compiling?

Thank you.

-- Mike
 
Mike,

Would you please upload your namelist.atmopshere and streams.atmosphere for me to take a look? Which version of MPAS did you run? Thanks.
 
I think there may be an issue with SMIOL when the product of config_pio_num_iotasks and config_pio_stride doesn't equal the total number of MPI ranks. Could you try the run again, but using
&io
config_pio_num_iotasks = 32
config_pio_stride = 4
/
for example?
 
Thank you for the recommendation. I made the i/o namelist adjustments, but the problem still occurs. I think this may be an issue with our GSL fork of MPAS. I'm not able to reproduce the problem with the code from NCAR's mpas-Dev repository. I will keep troubleshooting.
 
Top