Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

aio_write issue MPAS run on Mac

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

xtian15

New member
Hi,
I was trying to run the idealized JW wave case with MPAS-Atmosphere on the recent Mac with 8 M1 cpus. For both init_atmosphere_model and atmosphere_model, Everything works perfectly with 1 or 2 cores. But if the job is run with >2 cores, the job will simply get hung there without making any progresses or being terminated, and the following message will show up but no complaints at all in the logs.
Code:
GPTLstart name=PIO:write_darray_multi_par: GPTLinitialize has not been called
GPTLstop: GPTLinitialize has not been called
GPTLstop: GPTLinitialize has not been called
GPTLstop: GPTLinitialize has not been called
GPTLstop: GPTLinitialize has not been called
mca_fbtl_posix_ipwritev: error in aio_write():  Resource temporarily unavailable
vulcan_write_all: fbtl_ipwritev failed

The MPAS-Atmosphere was compiled at single precisions with PIO2.
Thanks in advance!
 
Is the job hanging when reading initial conditions or when writing model output, or does it hang during normal time steps? If it appears that the error is related to model I/O, you could try using just a single I/O task with the following namelist settings (e.g., for 8 MPI tasks):
Code:
&io
    config_pio_num_iotasks = 1
    config_pio_stride = 8
/
 
The job started hanging when it first attempts to write the output or the first history file. In the log file, the time integration has not started but is just about to. The error is indeed related model IO. When I happen to forget to remove the output.nc which will trigger the clobber mode complaint and .err log files and make the model skip the IO step, the integration on the contrary will move on as expected.
You &io modification solved the issue :D
Thanks very much!
 
Thanks for following up -- it's good to know that switching to effectively serial I/O (with just one I/O task) resolves the issue!
 
Top