Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Running Model with Dense Meshes

nakulkarle

New member
Hello Everyone,

I am a little lost and need some guidance.
What changes do I need to make regarding the dense meshes before running the init_atmosphere component? A note is provided on the mesh download webpage, but unfortunately, I did not fully understand it. I want to process static and terrestrial fields for 12km quasi-uniform mesh.

When I run the "init_atmosphere" there is no error file generated. However, the job doesn't seem to run completely. This is what the "log.init_atmosphere.0000.out" have in it
--------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------

Beginning MPAS-init_atmosphere Output Log File for task 0 of 1

Opened at 2023/06/26 16:38:33
----------------------------------------------------------------------

Using default single-precision reals

Reading namelist from file namelist.init_atmosphere

Reading streams configuration from file streams.init_atmosphere

Found mesh stream with filename template x1.4096002.grid.nc

Using default io_type for mesh stream

** Attempting to bootstrap MPAS framework using stream: input

Bootstrapping framework with mesh fields from input file 'x1.4096002.grid.nc'

WARNING: Attribute parent_id not found in x1.4096002.grid.nc

WARNING: Setting parent_id to ''

* Requested field lbc_scalars is deactivated due to packages, or is a scratch variable.

* Requested field lbc_u is deactivated due to packages, or is a scratch variable.

* Requested field lbc_w is deactivated due to packages, or is a scratch variable.

* Requested field lbc_rho is deactivated due to packages, or is a scratch variable.

* Requested field lbc_theta is deactivated due to packages, or is a scratch variable.

Parsing run-time I/O configuration from streams.init_atmosphere ...

----- found immutable stream "input" in streams.init_atmosphere -----

filename template: x1.4096002.grid.nc

filename interval: none

direction: input

reference time: initial_time

record interval: -

input alarm: initial_only

----- found immutable stream "output" in streams.init_atmosphere -----

filename template: x1.4096002.static.nc

filename interval: none

direction: output

reference time: initial_time

record interval: -

output alarm: initial_only

package: initial_conds

----- found immutable stream "surface" in streams.init_atmosphere -----

filename template: x1.40962.sfc_update.nc

filename interval: none

direction: output

reference time: initial_time

record interval: -

output alarm: 86400

package: sfc_update


----- found immutable stream "lbc" in streams.init_atmosphere -----

filename template: lbc.$Y-$M-$D_$h.$m.$s.nc

filename interval: 3:00:00

direction: output

reference time: initial_time

record interval: -

output alarm: 3:00:00

package: lbcs

----- done parsing run-time I/O from streams.init_atmosphere -----


** Validating streams


Reading dimensions from input streams ...

----- reading dimensions from stream 'input' using file x1.4096002.grid.nc

nCells = 4096002

nEdges = 12288000

nVertices = 8192000

TWO = 2

maxEdges = 6

maxEdges2 = 12

vertexDegree = 3

----- done reading dimensions from input streams -----

Processing decomposed dimensions ...


----- done processing decomposed dimensions -----

Assigning remaining dimensions from definitions in Registry.xml ...

THREE = 3

FIFTEEN = 15

TWENTYONE = 21

R3 = 3

nVertLevels = 55 (config_nvertlevels)

nSoilLevels = 4 (config_nsoillevels)

nFGLevels = 38 (config_nfglevels)

nFGSoilLevels = 4 (config_nfgsoillevels)

nVertLevelsP1 = 56

nMonths = 12 (config_months)

----- done assigning dimensions from Registry.xml -----


------------------------------------------------------------------------------------------------------------------------

Thank you.
 
I guess you try to run global MPAS with the 12-km mesh. Please let me know if I am wrong. Before running the model, I suppose you have successfully compiled MPAS. Then please follow the steps below:

(1) download the 12-km global mesh (which you have done)
(2) create graph.info for parallel run
(3) modify namelist.init_atmopshere and streams.init_atmosphere for static data generation
(4) run init_atmosphere_model to produce static datafile
(5) modify namelist.init_atmopshere and streams.init_atmosphere for initial condition generation
(6) run init_atmosphere_model to produce initial data for global MPAS run
(7) modify namelist.atmosphere and streams.atmosphere
(8) run atmosphere_model

If you have any issue during the process, please send us the log file that includes the error message and your namelist as well streams files to take a look.
 
I guess you try to run global MPAS with the 12-km mesh. Please let me know if I am wrong. Before running the model, I suppose you have successfully compiled MPAS. Then please follow the steps below:

(1) download the 12-km global mesh (which you have done)
(2) create graph.info for parallel run
(3) modify namelist.init_atmopshere and streams.init_atmosphere for static data generation
(4) run init_atmosphere_model to produce static datafile
(5) modify namelist.init_atmopshere and streams.init_atmosphere for initial condition generation
(6) run init_atmosphere_model to produce initial data for global MPAS run
(7) modify namelist.atmosphere and streams.atmosphere
(8) run atmosphere_model

If you have any issue during the process, please send us the log file that includes the error message and your namelist as well streams files to take a look.
Hello,
Thank you so much for the quick response. I sincerely appreciate it.
I want to run a limited area simulation over the south-central USA (Texas and New Mexico). Hence the 12 km mesh.
I am trying to run the init_atmosphere model to produce static file. As per the above mentioned steps, I also created a graph.info for parallel run.
However, as seen in the attached log file, the init_atmosphere model seems stuck and has not progressed for 30 minutes or more.

I am attaching the log_init_atmosphere.0000.out, namelist_init_atmosphrere and streams_init_atmosphere here.
I am sure there is something I am missing out on, hence the issue. By the no error file was generated in this case. I appreciate your guidance.
Thank you.
 

Attachments

  • nkarle.zip
    3.4 KB · Views: 4
One question I have:

Did you use limited_area tool to produces regional grid and graph.info?

Also, would you please clarify step-by-step what you have done so far?

Thanks.
 
One question I have:

Did you use limited_area tool to produces regional grid and graph.info?

Also, would you please clarify step-by-step what you have done so far?

Thanks.
Hello,

1. I downloaded the 12km uniform mesh from the webpage and prepared the "graph.info" for parallel run. Since I was not sure I just prepared "graph.info.part.2048", "part.256", "part.1024", "part.96" and "part.36".
2. Modified the "namelist.init_atmosphere" and "streams.init_atmosphere"
3. Submitted job for running the "init_atmosphere_model"
4. Even after an hour the job was stuck at a point as seen in the log file previously attached.
5. No error file was generated in the process.

Thank you.
 
Your namelist and streams files both look fine. There is no error message in your log file. So I am suspicious this could be a machine issue, or the I/O get stuck for some reasons I don't know yet.
Based on your namelist settings, I guess you are running this case in Cheyenne. Please let me know your working directory. I will try to repeat your case and see whether I would get the same issue.
 
Your namelist and streams files both look fine. There is no error message in your log file. So I am suspicious this could be a machine issue, or the I/O get stuck for some reasons I don't know yet.
Based on your namelist settings, I guess you are running this case in Cheyenne. Please let me know your working directory. I will try to repeat your case and see whether I would get the same issue.
Yes, I am running it on Cheyenne.
The following is my working directory,

/glade/work/nkarle/MPAS/12km_uniform
 
Hi,

Thank you for the information. I repeated your case in Cheyenne. Please see the case in /glade/scratch/chenming/mpas-help.

Below is what I found:

(1) In my first try, I use your namelist and streams files. The job was hanging there for more than 10 minutes with the log.init file exactly the same as shown in your post, --- I am suspicious that there is no sufficient memory to run this case with a single processor.

(2) In the 2nd try, I created the file "x1.4096002.graph.info.part.1152", run in parallel mode using 32 nodes. And the job silently stopped with the error message "MPT: shepherd terminated: r9i2n8.ib0.cheyenne.ucar.edu - job aborting"
I believe this is more like a machine issue

I know that Cheyenne has some issues that may lead to the above error. CISL will start fixing this problem on August 7. Before they fix the problem, I guess we will repeatedly experience this error.

I can confirm that what you did is correct. Let's see how it works after CISL fixes the machine issue. Please keep me updated of any progress/issue regarding this case. Thanks.
 
Last edited:
Hi,

Thank you for the information. I repeated your case in Cheyenne. Please see the case in /glade/scratch/chenming/mpas-help.

Below is what I found:

(1) In my first try, I use your namelist and streams files. The job was hanging there for more than 10 minutes with the log.init file exactly the same as shown in your post, --- I am suspicious that there is no sufficient memory to run this case with a single processor.

(2) In the 2nd try, I created the file "x1.4096002.graph.info.part.1152", run in parallel mode using 32 nodes. And the job silently stopped with the error message "MPT: shepherd terminated: r9i2n8.ib0.cheyenne.ucar.edu - job aborting"
I believe this is more like a machine issue

I know that Cheyenne has some issues that may lead to the above error. CISL will start fixing this problem on August 7. Before they fix the problem, I guess we will repeatedly experience this error.

I can confirm that what you did is correct. Let's see how it works after CISL fixes the machine issue. Please keep me updated of any progress/issue regarding this case. Thanks.
Hi Ming,

I was just wondering, by any chance, if you know whether CISL has already fixed the machine issue.
Thank you.
 
No, I don't think so. CISL Bulletin indicates that they will start looking at this issue after August 8.
Hi Ming,

I hope this message finds you well.
Assuming CISL has fixed the problem, I ran the 12 km mesh and experienced the same technical issue. So thought I should inform you about it.
Could you kindly advise how to proceed from here?
Thank you.
 
I think there may be a few issues, a couple of which are specific to MPAS-A releases prior to v8.0.

For MPAS v7.3 and earlier, the following are important:

1) Ensure that the following vertical dimensions are all set to 1 when processing static fields to avoid allocating large 3-d atmospheric fields
Code:
&dimensions
    config_nvertlevels = 1
    config_nsoillevels = 1
    config_nfglevels = 1
    config_nfgsoillevels = 1
/

2) Use a special CVT partition file when processing static fields in parallel. In the case of the 12-km quasi-uniform mesh, this implies that your graph partition file prefix should be specified as
Code:
&decomposition
    config_block_decomp_file_prefix = 'x1.4096002.cvt.part.'
/

3) Under-subscribe nodes if processing the GWDO static fields. Each MPI rank will allocate an additional ~3.7 GB of memory to hold the global terrain dataset, and another ~3.7 GB of memory for the global land-use dataset. On Cheyenne, you may need to use just 4 MPI ranks per node, since each regular batch node has just 45 GB of usable memory.

If you're working with MPAS v8.0, (1) and (2) above are no longer necessary; but, because the processing of the GWDO sub-grid orography fields on the native unstructured mesh still requires a substantial amount of memory on each MPI rank, you will likely need to under-subscribe Cheyenne nodes.
 
I think there may be a few issues, a couple of which are specific to MPAS-A releases prior to v8.0.

For MPAS v7.3 and earlier, the following are important:

1) Ensure that the following vertical dimensions are all set to 1 when processing static fields to avoid allocating large 3-d atmospheric fields
Code:
&dimensions
    config_nvertlevels = 1
    config_nsoillevels = 1
    config_nfglevels = 1
    config_nfgsoillevels = 1
/

2) Use a special CVT partition file when processing static fields in parallel. In the case of the 12-km quasi-uniform mesh, this implies that your graph partition file prefix should be specified as
Code:
&decomposition
    config_block_decomp_file_prefix = 'x1.4096002.cvt.part.'
/

3) Under-subscribe nodes if processing the GWDO static fields. Each MPI rank will allocate an additional ~3.7 GB of memory to hold the global terrain dataset, and another ~3.7 GB of memory for the global land-use dataset. On Cheyenne, you may need to use just 4 MPI ranks per node, since each regular batch node has just 45 GB of usable memory.

If you're working with MPAS v8.0, (1) and (2) above are no longer necessary; but, because the processing of the GWDO sub-grid orography fields on the native unstructured mesh still requires a substantial amount of memory on each MPI rank, you will likely need to under-subscribe Cheyenne nodes.
Thank you for the information. I will definitely try it on v7.3.
Also, I tried compiling the init_atmosphere model of v8 but experienced the following errors.
Not sure what I am doing wrong here. Can you please help.

-----------

f951: Fatal Error: Reading module ‘../external/esmf_time_f90/esmf.mod’ at line 1 column 2: Unexpected EOF

compilation terminated.

Makefile:119: recipe for target 'mpas_derived_types.o' failed

make[3]: *** [mpas_derived_types.o] Error 1

make[3]: Leaving directory '/glade/work/nkarle/MPAS_v08/MPAS-Model/src/framework'

Makefile:31: recipe for target 'frame' failed

make[2]: *** [frame] Error 2

make[2]: *** Waiting for unfinished jobs....

make[6]: Leaving directory '/glade/work/nkarle/MPAS_v08/MPAS-Model/src/tools/input_gen'

(make -j 1 streams_gen CPPFLAGS="-D_MPI -DCORE_INIT_ATMOSPHERE -DMPAS_NAMELIST_SUFFIX=init_atmosphere -DMPAS_EXE_NAME=init_atmosphere_model -DSINGLE_PRECISION -DMPAS_NATIVE_TIMERS -DMPAS_GIT_VERSION=v8.0.1 -DMPAS_BUILD_TARGET=gfortran -DMPAS_PIO_SUPPORT -DUSE_PIO2" CPPINCLUDES="-I/glade/u/apps/ch/opt/pio/2.5.5/mpt/2.25/gnu/10.1.0//include -I/glade/u/apps/ch/opt/netcdf-mpi/4.8.1/mpt/2.25/gnu/10.1.0//include -I/glade/u/apps/ch/opt/pnetcdf/1.12.2/mpt/2.25/gnu/10.1.0//include")

make[6]: Entering directory '/glade/work/nkarle/MPAS_v08/MPAS-Model/src/tools/input_gen'

make[6]: warning: -jN forced in submake: disabling jobserver mode.

(cd ../../external/ezxml; make CFLAGS="-O3 -DSINGLE_PRECISION " OBJFILE="ezxml_tools.o")

make[7]: Entering directory '/glade/work/nkarle/MPAS_v08/MPAS-Model/src/external/ezxml'

make[7]: 'ezxml_tools.o' is up to date.

make[7]: Leaving directory '/glade/work/nkarle/MPAS_v08/MPAS-Model/src/external/ezxml'

gcc -D_MPI -DCORE_INIT_ATMOSPHERE -DMPAS_NAMELIST_SUFFIX=init_atmosphere -DMPAS_EXE_NAME=init_atmosphere_model -DSINGLE_PRECISION -DMPAS_NATIVE_TIMERS -DMPAS_GIT_VERSION=v8.0.1 -DMPAS_BUILD_TARGET=gfortran -DMPAS_PIO_SUPPORT -DUSE_PIO2 -O3 -DSINGLE_PRECISION -I../../external/ezxml -o streams_gen streams_gen.o test_functions.o ../../external/ezxml/ezxml_tools.o

make[6]: Leaving directory '/glade/work/nkarle/MPAS_v08/MPAS-Model/src/tools/input_gen'

make[5]: Leaving directory '/glade/work/nkarle/MPAS_v08/MPAS-Model/src/tools/input_gen'

make[4]: Leaving directory '/glade/work/nkarle/MPAS_v08/MPAS-Model/src/tools'

make[3]: Leaving directory '/glade/work/nkarle/MPAS_v08/MPAS-Model/src/tools'

make[2]: Leaving directory '/glade/work/nkarle/MPAS_v08/MPAS-Model/src'

Makefile:1242: recipe for target 'mpas_main' failed

make[1]: *** [mpas_main] Error 2

make[1]: Leaving directory '/glade/work/nkarle/MPAS_v08/MPAS-Model'

Makefile:382: recipe for target 'gfortran' failed

make: *** [gfortran] Error 2
 
I'm not sure what the issue might be, unfortunately. I've just tried compiling on Cheyenne with the following modules loaded, which I think may roughly match what you may be using:
Code:
Currently Loaded Modules:
  1) ncarenv/1.3   2) gnu/10.1.0   3) ncarcompilers/0.5.0   4) mpt/2.25   5) netcdf-mpi/4.8.1   6) pnetcdf/1.12.2   7) pio/2.5.5
Compilation of the init_atmosphere core was successful with the following build command:
Code:
make -j4 gfortran CORE=init_atmosphere PRECISION=single
 
As an aside, you will likely get better model simulation rates if you use the Intel compilers on Cheyenne, although compilation time will generally be longer compared with the GNU compilers.
 
I'm not sure what the issue might be, unfortunately. I've just tried compiling on Cheyenne with the following modules loaded, which I think may roughly match what you may be using:
Code:
Currently Loaded Modules:
  1) ncarenv/1.3   2) gnu/10.1.0   3) ncarcompilers/0.5.0   4) mpt/2.25   5) netcdf-mpi/4.8.1   6) pnetcdf/1.12.2   7) pio/2.5.5
Compilation of the init_atmosphere core was successful with the following build command:
Code:
make -j4 gfortran CORE=init_atmosphere PRECISION=single
Thank you for the prompt response.
Yes, I have the same modules loaded. But compilation is still failing for some reason.
 
If you still have your working directory on /glade, I can take a look to see if there are differences between the `esmf.mod` file in your build directory and mine.

In a separate directory, though, it may be easiest to just switch to the Intel compilers. Here's the set of modules that have worked for me:
Currently Loaded Modules:
1) ncarenv/1.3 3) ncarcompilers/0.5.0 5) netcdf-mpi/4.8.1 7) pio/2.5.5
2) intel/2022.1 4) mpt/2.25 6) pnetcdf/1.12.2
 
If you still have your working directory on /glade, I can take a look to see if there are differences between the `esmf.mod` file in your build directory and mine.

In a separate directory, though, it may be easiest to just switch to the Intel compilers. Here's the set of modules that have worked for me:
Yes, my working directory is on /glade/work/nkarle/MPAS_v08.
Thank you for sharing the module list. I will compile the code in a separate folder with Intel compilers.
Thank you! Will keep you posted.
 
If you still have your working directory on /glade, I can take a look to see if there are differences between the `esmf.mod` file in your build directory and mine.

In a separate directory, though, it may be easiest to just switch to the Intel compilers. Here's the set of modules that have worked for me:
I tried using the intel compilers but still received the same error messages. Not sure whats going on.
 

Attachments

  • Screenshot 2023-09-01 at 11.50.57 AM.png
    Screenshot 2023-09-01 at 11.50.57 AM.png
    364.4 KB · Views: 3
Thanks for the path. Interestingly, the sizes for our esmf.mod files are substantially different:
-rw-r--r-- 1 duda ncar 18713 Sep 1 09:44 esmf.mod
versus
-rw-r--r-- 1 nkarle ncar 39546 Aug 30 09:30 esmf.mod
An octal dump of the files suggests that they were created with different compilers, so I'm wondering whether there might be something about your shell environment that's causing a different compiler to be used than the expected gfortran 10.1.0?

Do you by any chance have any conda environments that are active? Or were you working in a conda environment that was deactivated before compiling MPAS? I have heard from colleagues that there have been some bad interactions between conda environments on Cheyenne -- even after they've been deactivated -- and the modules for various compilers.
 
Top