Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

[Derecho] MPAS-A stops writing output after initial time

jpiers

New member
Hi,

I am running MPASv8.0.1 on Derecho with a 60-10km variable mesh. After submitting atmosphere_model to the queue (see run_model.pbs), the model runs through its wall-time, but only will output history and diag .nc files for the initialization date. The model doesn't crash, so I don't have error files.

Note I am using 16 nodes with 32 MPI processes each, so I include the corresponding graph partitioning file (x6.999426.graph.info.part.512). I also have set the following in namelist.atmosphere accordingly:
&io
config_pio_num_iotasks = 16
config_pio_stride = 32
In streams.atmosphere, I also specify the following for each output type:
io_type="pnetcdf,cdf5"
Lastly, here are the modules currently used in Derecho:
Currently Loaded Modules:
1) ncarenv/23.09 (S) 3) ncarcompilers/1.0.0 5) cray-mpich/8.1.27 7) netcdf-mpi/4.9.2 9) parallelio/2.6.2
2) intel-classic/2023.2.1 4) craype/2.7.23 6) hdf5-mpi/1.12.2 8) parallel-netcdf/1.12.3
Thank you for your help!
 

Attachments

  • run_model.pbs.txt
    1.2 KB · Views: 6
  • streams.atmosphere.txt
    1.8 KB · Views: 0
  • namelist.atmosphere.txt
    2.1 KB · Views: 1
  • log.atmosphere.0000.out.txt
    10.7 KB · Views: 2
  • mpas_run.o3197639.txt
    596.9 KB · Views: 2
I haven't looked closely at this yet. As an initial guess, could you see if the recommendations in this thread resolves your issue? MPAS-A hanging while writing output files

One other suggestion might be to try with the ncarenv/23.06 module loaded instead. I have had a couple of issues with the 23.09 software stack on Derecho.
 
Thanks for the quick reply. Yes, I was taking a look at that thread and that led me to include io_type="pnetcdf,cdf5" in streams.atmosphere. Also, when running ncdump -k init.nc, I see the type is cdf5.

And it does not seem the ncarenv/23.06 makes a difference - thanks for the suggestions.
 
Then I think my two suggestions at the moment are:
- Try building and then running with SMIOL instead of PIO (as suggested in the other thread)
- Try building and running with the ncarenv/23.06

Actually I should ask given my second suggestion: have you experienced this error in any other context? Or is this the first time you noticed the issue?
 
Thanks, I'll give it a go. This is my first attempt running MPAS with Derecho. I don't believe I had this issue with Cheyenne. If I did, it was likely solved by changing up the partitioning, but I am using the same partitioning method as before.
 
Your tip to rebuild and run atmosphere_model with SMIOL seems to work! We'll see if the model completes... but not bad for a Friday evening.

As suggested in MPAS-A hanging while writing output files, I built atmosphere_model without the parallelio module. Note that this did not require a change to my init.nc file, so rebuilding init_atmosphere_model was unnecessary.
 
Top