Problem generating static file for 15 km with 3 km refinement mesh

jhegarty · Sep 8, 2021

Hi,

I am having trouble generating the static fields for the x5.6488066 15 km grid with 3 km refinement on Cheyenne. I am using MPAS Atmosphere version 7 and have re-built init_atmosphere_model commenting out lines 217 – 222 in src/core_init_atmosphere/mpas_init_atm_cases.F to allow for the interpolation of the static fields to be done in parallel. This was suggested on the MPAS-Atmosphere web page describing MPAS Meshes for high resolution runs. I have selected 8 nodes with 32 processors per node for a total of 256 processors. I have also set config_block_decomp_file_prefix = "meshgrid/x5.6488066.cvt.part. so that the proper partition files will be used and downloaded that partition file to the meshgrid sub-directory off my run directory.
The executable runs for some time and generates many lines in the log.init_atmosphere.0000.out file but a static.nc file is not generated. In the .err files is the following message.
ERROR: Error reading topography tile /glade/work/wrfhelp/WPS_GEOG/topo_gmted2010_30s/42001-43200.06001-07200
ERROR: Error reading global 30-arc-sec topography for GWD statistics
ERROR: ****************************************************************
ERROR: Error while trying to compute sub-grid-scale orography
ERROR: statistics for use with the GWDO scheme.
CRITICAL ERROR: ****************************************************************

I have verified that the file being read exists. I also initially tried generating the static files in serial, but it also didn’t work. However, the .err file for the serial run got overwritten before I had a chance to verify whether the error above was produced.
I noticed in a related thread that you recommended setting io_type=pnetcdf,cfd5 for the output stream; however, since this looks to be an input error I wasn't sure this would work.

What else should I try to get this to work?
Thank you for your help.

Jen

mcurry · Sep 9, 2021

Hi Jen, your setup is very close to working; however, I believe the Cheyenne nodes are running out of memory. I think if you under-subscribe nodes it will complete. For the GWDO scheme interpolation, each MPI tasks will need to read in 4 GB of data. While Cheyenne advertises 64 GB, they have about 45 GB of usable memory.

In your current setup, 32 tasks per node, we can see that the amount of memory any one node would require would be 128 GB (32 * 4). Thus, we need under subscribe the amount of memory any one node is receiving. For instance, you could try 32 nodes and 8 cpus/tasks per node:

Code:

#PBS -l select=32:ncpus=8:mpiprocs=8

Which would require only 32 GB per node, well under the limit.

If you wanted to try less nodes you could try 16 nodes and 16 cpus/tasks per node, but you would need to request the big-memory nodes, which might have a longer wait time in the queue:

Code:

#PBS -l select=16:ncpus=16:mpiprocs=16:mem=109GB

These nodes will use 64 GB of memory, which is under the 109 GB of usable memory on the big-memory nodes.

Lastly, you do not need to specify io_type as "pnetcdf,cdf5" as no fields are above 4 GB limit imposed by the classic NetCDF format; however, you will need to specify this for the output when you create initial conditions.

As a side note, and from my own personal experience, when ever I work with a large mesh like this, I generally set io_type = "pnetcdf,cdf" for every run, whether its actually "needed" or not. Mainly, this just ensures that I have it set, rather than figuring that I had forgotten to set it after a long run completed that failed during the output.

Let me know if the above works for you or not.

jhegarty · Sep 9, 2021

Hi MCurry,

Thank you for the suggestions. I ended up using 32 nodes with 8 processors per node as below and that worked.
#PBS -l select=32:ncpus=8:mpiprocs=8

With this new static file I am trying to generate the initial conditions file with these same settings and it isn't working. I don't get any .err files, but the program aborts before writing the initial conditions file. I have set io_type="pnetcdf,cdf5" in the output stream and have verified in the log file that this option has been set properly. I am running the program off of my /glade/work partition but am writing the initial conditions file to /glade/scratch to ensure that space is not an issue. I see on my scratch partition that a .nc file with a name starting with "x5.6488066" exists but has 0 bytes so I know it is attempting to write to the correct location.
Is there a different node/processor setting you would recommend for this run? My file will have 269 vertical levels.

Thanks,

Jen

mcurry · Sep 9, 2021

Hi Jen, Glad that worked.

Now that we have finished with the static interpolation, we do not need to use the CVT partition file for other MPAS operations, including applying the initial conditions. As well, you do not need to under-subscribe the nodes as you did in the static interpolation step, so I would recommend using as many cpus/tasks per node as possible.

Your setup seems fine, but I would recommend running with a higher task count. Because you are wanting to generate on a significantly higher number of vertical grids than stand-alone MPAS I believe you will want to use more cpus/tasks rather than less.

As another suggestion, you may want to first interpolate initial conditions to your 15-3 km static file with the default 58 level as an initial exercise. That will help confirm your setup is correct and will also help you in gauging how many tasks you need for interpolating to the 269 vertical levels. Although I have not interpolated 269 vertical levels, I think it will need about 4-5 x as many cpus/tasks as the 58 level.

jhegarty · Sep 22, 2021

Hi,
I was eventually able to generate initial conditions for the x5.6488066 15-3km run with 268 vertical levels. It required 64 nodes to get past the memory issues. I am now trying to get MPAS to work for this grid. I first tried 512 nodes. This took about a day to get submitted and exited without generating any .err files after writing out the first time and running for about 5 seconds. Next, I tried the same number of nodes but increased the maximum memory per node to 109 GB using the PBS setting
#PBS -l select=512:ncpus=8:mpiprocs=8:mem=109GB
I submitted this job September 15, 10:11 mountain time and it took until September 19, 18:07 for it to get submitted, which is 4 1/3 days. The result was the same as before, it only ran for a very short time before exiting.
I found in an earlier thread on this issue that a rule of thumb for memory requirements is 0.178 Mb/cell. So, for my configuration that would be
.178 x6488066 x268 = 309,506,700.464 Mb or ~309,507 Gb.
Assuming 45 Gb of usable memory per node that would require 6,878 nodes which is more than are available on Cheyenne. Assuming the large memory nodes which have 109 GB of usable memory that would be 2840 nodes.
Given that Cheyenne has 4032 nodes, the only setting that might work with the available grid partition files would be
#PBS -l select=3072:ncpus=2:mpiprocs=2:mem=109GB
This would use the 6144 partition file. I could also generate new partition files in even multiples of 2900 (e.g., 11600, 23200 etc) if you think that would be better. In any case it looks like my job would still require more than half of the Cheyenne nodes. I should point out that I am also hoping to use the convection permitting suite.
Could you suggest anything else to try?
Thanks

Jen

mgduda · Sep 22, 2021

Regarding the memory estimate, I think I didn't state this as well as I could have in the thread where this is discussed. The 0.175 MB figure is actually per grid column (assuming 55 vertical levels, and running in single-precision). So, I think you may need around 6488066 columns * 0.175 MB/column * (268/55) = 5533 GB. With 45 GB of usable memory per node, I think the simulation should run on Cheyenne with around 123 nodes; rounding this up to around 150 nodes for good measure might not hurt.

mgduda · Sep 22, 2021

When running the model itself (atmosphere_model) on Cheyenne, I don't think there's a reason to not use fully subscribed nodes (i.e., with 36 MPI ranks per node). In any case, 512 nodes should have given you enough aggregate memory that lack of memory may not be the cause of the model failure. If you happen to still have your working directory from a failed model run intact, I could take a look to see if there's any indication of the problem that I can identify.

jhegarty · Sep 23, 2021

Hi Michael,

Thank you. My run directory is /glade/work/hegarty/x5. I run MPAS with the script called run_mpas_200_gw_x5_cp.csh. I am writing the output to /glade/scratch/hegarty and mv the log.atmosphere.0000.out to that location after the job completes. There were no. err files generated for this particular run.

Once I get it running I will maximize the number of processors per node. But for now I wanted to keep the number of tasks as a multiple of 32 since partition files of that multiple were readily available.

Thanks again for taking a look.

Jen

mgduda · Sep 24, 2021

There are a couple of settings in your namelist.atmosphere file that I think you'll want to change:

config_dt - This is currently set at 20.0. For a mesh with a minimum horizontal grid distance of 3000 m, I would ordinarily start with a time step of 18 or even 15 seconds; but, given that you've got quite a few more vertical levels (and, in particular, thinner levels near the surface), you may want to begin with an even smaller timestep.
config_len_disp - This is currently set to 25000.0, and for the 15 - 3 km mesh, this should probably be set to 3000.0.

I'd second mcurry's suggestion to start with a "standard" configuration of the 15 - 3 km mesh with 55 vertical layers, and work your way toward your final configuration with 268 vertical levels.

Problem generating static file for 15 km with 3 km refinement mesh

jhegarty

New member

mcurry

New member

jhegarty

New member

mcurry

New member

jhegarty

New member

mgduda

Administrator

mgduda

Administrator

jhegarty

New member

mgduda

Administrator