Static Initialization Failure - 15km uniform mesh

General questions about running MPAS-Atmosphere, or about issues encountered while trying to run the MPAS-Atmosphere model.
Post Reply
paige_d
Posts: 4
Joined: Mon Jun 08, 2020 10:01 am

Static Initialization Failure - 15km uniform mesh

Post by paige_d » Mon Jun 08, 2020 10:15 am

Good morning,

I am a new MPAS user and am battling to successfully complete the static initialization of one of the uniform-resolution grids (15km - x1.2621442). I have managed to complete this, without any problems, on many other meshes (15km variable-resolution, 25km variable-resolution, 240km uniform-resolution and 240-km variable resolution).
The job is terminated at the same point on every attempt with exit code 137 & the static error file (below) suggests that this is due to a memory limit.

I would appreciate any advice as to how I may resolve this. Thanks!
Operating system error: Cannot allocate memory
Allocation would exceed memory limit

Error termination. Backtrace:

Could not print backtrace: mmap: Cannot allocate memory
#0 0x2aaaab8b1d8a
#1 0x2aaaab8b2865
#2 0x2aaaab8b2a62
#3 0x55d506
#4 0x5082c5
#5 0x50f886
#6 0x43a88c
#7 0x408a60
#8 0x40802b
#9 0x2aaaac551b34
#10 0x408062
#11 0xffffffffffffffff
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node cnode0629 exited on signal 9 (Killed).
--------------------------------------------------------------------------

mcurry
Posts: 31
Joined: Mon Oct 29, 2018 5:33 pm
Location: Boulder, Co

Re: Static Initialization Failure - 15km uniform mesh

Post by mcurry » Mon Jun 08, 2020 5:42 pm

The current static implementation of the static initialization is quite memory intensive. This often leads to this type of error because there is often a limit upon the amount of memory that can be allocated by one program (i.e. the stack size).

Depending on your chosen shell you can set the stack size to have no restrictions by the one of the following commands on the command line before running the static initialization:

For sh based shells (sh, bash, zsh):

Code: Select all

ulimit -s unlimited
For csh based shells (csh, tcsh):

Code: Select all

limit stacksize unlimited
If you are not sure shell you are using, you can find out by printing the value in the SHELL environment variable:

Code: Select all

echo $SHELL

This should allow no limit on the stacksize, and therefore allow the static initialization complete successfully. However, please let us know if it does not!
NCAR|MMM

paige_d
Posts: 4
Joined: Mon Jun 08, 2020 10:01 am

Re: Static Initialization Failure - 15km uniform mesh

Post by paige_d » Tue Jun 09, 2020 10:45 am

Thank you very much for your response.

I have tried the static initialization again, after including 'ulimit -s unlimited' in my qsub script, however the issue persists and the job stops at the same point as before.

Are there any files/scripts I could provide that may help us identify the problem?

Thanks for your time!

mgduda
Posts: 319
Joined: Mon Feb 26, 2018 7:35 pm

Re: Static Initialization Failure - 15km uniform mesh

Post by mgduda » Tue Jun 09, 2020 4:43 pm

By using a mesh decomposition in which each partition represents a convex region, it is possible to interpolate the static terrestrial fields in parallel following the steps outlined on the mesh download page for the 10-km quasi-uniform and 3-km quasi-uniform meshes. Although we don't yet have a convex partition file for the 15-km quasi-uniform mesh, I can produce one and make it available very soon.
NCAR/MMM

mgduda
Posts: 319
Joined: Mon Feb 26, 2018 7:35 pm

Re: Static Initialization Failure - 15km uniform mesh

Post by mgduda » Wed Jun 10, 2020 2:24 am

I've attached a gzipped convex partition file with 16 partitions to this post. You can interpolate the static terrestrial fields in parallel for the 15-km quasi-uniform mesh using this file, and by spreading the 16 MPI tasks (corresponding to the 16 partitions) across multiple nodes, you can make use of more aggregate memory, hopefully working around the memory allocation error you had previously encountered.

As mentioned on the mesh download page for the quasi-uniform 10-km and 3-km meshes, you'll need to comment-out lines 217 through 222 of src/core_init_atmosphere/mpas_init_atm_cases.F . It might be good to also set the io_type for the "output" stream to io_type="pnetcdf,cdf5". The only other change that's needed is in the namelist.init_atmosphere file, where you can set

Code: Select all

&decomposition
    config_block_decomp_file_prefix = 'x1.2621442.cvt.part.'
/
I've also updated the 15-km mesh download so that the convex partition file with 16 partitions will be available for anyone who downloads that mesh in future. If you encounter any other issues in interpolating the static fields for this mesh, please don't hesitate to follow-up in this thread.
Attachments
x1.2621442.cvt.part.16.gz
(163.4 KiB) Downloaded 2 times
NCAR/MMM

paige_d
Posts: 4
Joined: Mon Jun 08, 2020 10:01 am

Re: Static Initialization Failure - 15km uniform mesh

Post by paige_d » Wed Jun 10, 2020 2:02 pm

Thank you so much for the detailed response.

I do not currently have access to the mpas_init_atm_cases.F file, so it may take some time for me to test this. I will post an update to let you know if I was successful as soon as possible.

Thanks!

paige_d
Posts: 4
Joined: Mon Jun 08, 2020 10:01 am

Re: Static Initialization Failure - 15km uniform mesh

Post by paige_d » Wed Jun 17, 2020 12:13 pm

UPDATE:

Following the steps above, once I was able to comment out lines 217 - 222 of src/core_init_atmosphere/mpas_init_atm_cases.F, my static initialization completed successfully!

I am extremely grateful for the help, thank you so much.

mgduda
Posts: 319
Joined: Mon Feb 26, 2018 7:35 pm

Re: Static Initialization Failure - 15km uniform mesh

Post by mgduda » Wed Jun 17, 2020 4:33 pm

Thanks very much for the update, and it's good to know that the use of a CVT partition file to interpolate the static fields in parallel worked!
NCAR/MMM

Post Reply

Return to “Running”