Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

15km uniform, Initial Condition failed

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

makinde

New member
Good day
Thanks all for being there for us.

Please I am running 15km uniform resolution mesh, I have read in the forum the reply to the question of my colleague on the same 15km uniform mesh, how to use the mesh decomposition with 16 partitions which is now available with 15km mesh during the static initialization and how the cvt partition file is not necessary for the initial condition of which the regular SCVT files can be used.
However, I've been having issues with the initial condition, I have successfully run the static initialization with the cvt partition file but the initial condition keep failing without any error but with an exit status code of 139.
At first, I ran it with cvt using 16MPI, the x1.2621442.init.nc was created but empty, I later ran it using the normal SCVT file with 240MPI, there was still output of the x1.2621442.init.nc but was also empty this time around with errors "Bad return value from PIO".

Please what could be wrong, what must I do?

I have tried different combinations, I have repeatedly run this more than 10 times still the same thing.
Please find both log.init_atmophere.0000.out and log.init_atmoshpere.0000.err, static_stderr and the namelist in the attached.


Thank you for your usual help
 

Attachments

  • namelist.init_atmosphere.txt
    1.4 KB · Views: 59
  • log.init_atmosphere.0000.out.txt
    16.4 KB · Views: 58
  • log.init_atmosphere.0000.err.txt
    539 bytes · Views: 62
  • static_stderr.txt
    240.3 KB · Views: 62
I am having the same problem.

In a previous post I was assisted with the 15km uniform mesh static initialization. This was successful, however, I have been unable to use the same CVT partition file to perform the meteorological data initialization. The init file is created, though the job freezes and eventually terminates with exit code 137/139.

Please could you offer us some assistance as we have tried every possible solution.
Thank you for your time,
Paige
 
The error messages
ERROR: MPAS IO Error: Bad return value from PIO
are most likely a result of some variables (e.g., 'zb3', which is dimensioned by [nVertLevels+1, 2, nEdges]) exceeding the format restrictions of the default CDF-2 file format (which permits files to be larger than 4 GB while still restricting all individual variables or records to less than 4 GB in size).

The easiest way to work around this issue is to switch to either CDF-5 or HDF5 as the output format to be used for the initial conditions that are written by the init_atmosphere_model program. In the streams.atmosphere file, you could set io_type="pnetcdf,cdf5" to use the CDF-5 format, which in my experience offers much better performance than HDF5:
Code:
<immutable_stream name="output"
                  type="output"
                  io_type="pnetcdf,cdf5"
                  filename_template="init.nc"
                  packages="initial_conds"
                  output_interval="initial_only" />

The segmentation fault when using just 16 MPI tasks might result from not having enough memory if all 16 MPI tasks were running on the same node. I would guess that around 300 GB of aggregate memory across all MPI tasks might be required to produce the 15-km initial conditions, so if a single node has less than this amount, running across multiple nodes would be required.
 
Thank you Mgduda,
I actually get the
ERROR: MPAS IO Error: Bad return value from PIO
the error when I set the io_type of the output in the streams.atmosphere file to pnetcdf.
For the number of MPIs and the memory limit, I am running on CHPC, it has some list of queue for running jobs, the best queue available for my account is called "normal" which allows up to 240mpis across all nodes but will allow 16MPIs because its too small, the other queue that allows that which is called "serial" though in documentation it siad to not run jobs in paralell but its allows to set ncpus and nodes. So I have been using the "Serial" queue, can that be the cause?

Good news on this, I have been able to successfully run all the initialization successfully.
I was able to do that by setting all io_type="netcdf" for all output and input for the met and surface initialization and it all works fine.

But having same issue again with the model run it self, for the model run am using 128 MPIs.
Please can you make any suggestion to what could be done?

Thank you
 
Top