Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

static.nc fields with erroneous values

kalassak

New member
Hello! Some introduction: I have been trying to set up and run the MPAS-Atmosphere model (following the tutorial) and everything seemed like it was proceeding relatively well, until I ran atmosphere_model. It crashed without terminating, despite printing out a log.atmosphere.xxxx.err file with the following contents:

Code:
----------------------------------------------------------------------
Beginning MPAS-atmosphere Error Log File for task       2 of       4
    Opened at 2022/10/17 20:16:25
----------------------------------------------------------------------

ERROR: Error in compute_layer_mean: pressure should increase with index

Searching around on the forum led me to investigate whether the initial conditions had any issues, and I eventually traced the issue back to the x1.10242.static.nc file, which does not seem to be generating correctly. However, when run, init_atmosphere_model does not throw any errors.

So here is the trouble I am coming across with the static.nc file:
Various variables within the file are filled with erroneous, nonsensical values, to the point that I cannot plot them (without some extra work) with the MPAS-Plotting scripts.
This is because even the data defining the cell locations and bounds are incorrect.

For example, I tried to print out the values of latVertex, and they do not match the values in the grid.nc file. Instead, they jump around randomly and regularly display extreme powers of 10, like 10^17. I would suspect they are supposed to simply be copied over from the grid.nc file and match.

Nonsense values of latVertex from my static.nc:
Code:
[2.7310673115060634e+17 1.7287826538085938 -9.753121686919578e+16 ...
 -1.6915831565856934 5.242543056644729e-31 -1.6802843809127808]

The more reasonable values from grid.nc latVertex:
Code:
[ 0.45756539  0.44423887  0.45756539 ... -0.18871053 -0.91843818
 -0.91843819]

Similar issues appear with ter (terrain height) in static.nc. The max value is something like 10^38 and min around 10^-36, seemingly random:

ter_0_0.png

(please excuse the awkward formatting, I quickly resized the plots for ease of viewing and made no other changes)

What is interesting is that landmask appears to transfer just fine, perhaps because it's an integer value, and it's an issue with handling floating point numbers?

landmask_0_0.png

To try to make sense of the data and plot it visually, I had to generate the patches from the grid.nc file to produce the above plots, as the data is so mangled in static.nc and init.nc!

Other potential indicators for the issue include the fact that log.init_atmosphere.0000.out displays a bunch of zeroes for the timing statistics, despite actually taking around an hour to complete. There is no log.init_atmosphere.0000.err, and the model seems to think everything went fine.

Any help here is appreciated, thank you!
 

Attachments

  • streams.init_atmosphere.txt
    922 bytes · Views: 7
  • namelist.init_atmosphere.txt
    1.3 KB · Views: 5
  • log.init_atmosphere.0000.out.txt
    199.9 KB · Views: 4
Everything in your namelist.init_atmosphere and streams.init_atmosphere files looks reasonable to me. I would agree that the zero-valued timing statics are an indicator that something isn't quite right.

Which compilers (GCC, Intel, NVHPC, etc.) and compiler version are you using? Also, which versions of the I/O libraries (HDF5, NetCDF C and Fortran, Parallel-NetCDF, and PIO) are you using?

At least at the static interpolation stage, MPI shouldn't be an issue; but since we use MPI_Wtime for timing information and since the timing statistics seem wrong, perhaps there's something odd about the MPI library that's translating to issues in, e.g., the Parallel-NetCDF library, which is used by default to write output streams?
 
I am using gcc, g++, and gfortran. I believe the version is 7.5.0? (given by gcc --version)

For the libraries, I used versions directly from Index of /people/duda/files/mpas/sources as I was having issues getting the model to compile trying to use the latest versions of everything/versions I already had on my system. So, that would be:
- mpich-3.3.1
- zlib-1.2.11
- hdf5-1.10.5
- pnetcdf-1.11.2
- netcdf-c-4.6.3 (as 4.7.0 had a bug requiring curl which did not resolve after installing curl)
- netcdf-fortran-4.5.2 (then opted to use the netcdf fortran package that was added to the directory around the same time as the 4.6.3 netcdf c package)
- pio-2.4.4 (I believe? Also had various issues with this one as I could not figure out how to ./configure it, even v1 versions, so used cmake here)

The only thing I can think of regarding MPI is that I already have an install of it on my system for WRF, and maybe some wires got crossed or something with multiple versions being in the environment variables or something? In earlier attempts, I tried using that install and would have placed it in my PATH and such, but I'm pretty sure on my latest attempt I started from an entirely blank slate and that shouldn't have been an issue. Unless there is some package version I installed at some point a long time ago (with apt-get or something) which is confusing everything. I'm not very familiar with how all of that sort of "back end", seemingly invisible stuff works.
 
Apologies for the delay in replying! All of the above seems reasonable enough. I'm still puzzled by the zero-valued timing statistics, but setting those aside and assuming that there may be an issue in writing output files through the Parallel-NetCDF library (which is the default), it might be worth trying to write the static file through the serial NetCDF library. In your streams.init_atmosphere file, could you try adding io_type="netcdf" to the definition of the "output" stream, i.e.,

Code:
<immutable_stream name="output"
                  type="output"
                  io_type="netcdf"
                  filename_template="x1.10242.static.nc"
                  packages="initial_conds"
                  output_interval="initial_only" />

and running the init_atmosphere_model program again?
 
Adding io_type="netcdf" and rerunning init_atmosphere_model did not appear to fix the issue. The values I'm getting from latVertex are still the same:

Code:
[2.7310673115060634e+17 1.7287826538085938 -9.753121686919578e+16 ...
 -1.6915831565856934 5.242543056644729e-31 -1.6802843809127808]
 
I am using gcc, g++, and gfortran. I believe the version is 7.5.0? (given by gcc --version)

For the libraries, I used versions directly from Index of /people/duda/files/mpas/sources as I was having issues getting the model to compile trying to use the latest versions of everything/versions I already had on my system. So, that would be:
- mpich-3.3.1
- zlib-1.2.11
- hdf5-1.10.5
- pnetcdf-1.11.2
- netcdf-c-4.6.3 (as 4.7.0 had a bug requiring curl which did not resolve after installing curl)
- netcdf-fortran-4.5.2 (then opted to use the netcdf fortran package that was added to the directory around the same time as the 4.6.3 netcdf c package)
- pio-2.4.4 (I believe? Also had various issues with this one as I could not figure out how to ./configure it, even v1 versions, so used cmake here)

The only thing I can think of regarding MPI is that I already have an install of it on my system for WRF, and maybe some wires got crossed or something with multiple versions being in the environment variables or something? In earlier attempts, I tried using that install and would have placed it in my PATH and such, but I'm pretty sure on my latest attempt I started from an entirely blank slate and that shouldn't have been an issue. Unless there is some package version I installed at some point a long time ago (with apt-get or something) which is confusing everything. I'm not very familiar with how all of that sort of "back end", seemingly invisible stuff works.
Just wanted to put an idea into this chat log.

It is possible that the older libraries are not compatible with each other and leading to a propagating error when writing data.

If you have the ability to create a separate user account or virtual machine give this script a try. I know the libraries are compatible with each other. Maybe that's the issue???


Post in thread 'ERROR in build MPAS' ERROR in build MPAS
 
It may be worth trying the latest version of the PIO library (2.5.9 as of today) just on the off chance that there's a bug there that's leading to incorrect output values. If you've already been able to install PIO 2.4.4, I think it would take just a few minutes to install 2.5.9. Something like the following should work, assuming the NETCDF, PNETCDF and INSTALL_PREFIX environment variables have been set:

Bash:
git clone https://github.com/NCAR/ParallelIO
cd ParallelIO
git checkout -b pio-2.5.9 pio2_5_9
export PIOSRC=`pwd`
cd ..
mkdir pio
cd pio
export CC=mpicc
export FC=mpif90
cmake -DNetCDF_C_PATH=$NETCDF -DNetCDF_Fortran_PATH=$NETCDF \
      -DPnetCDF_PATH=$PNETCDF -DHDF5_PATH=$NETCDF \
      -DCMAKE_INSTALL_PREFIX=$INSTALL_PREFIX \
      -DPIO_USE_MALLOC=ON -DCMAKE_VERBOSE_MAKEFILE=1 -DPIO_ENABLE_TIMING=OFF $PIOSRC
make
make check
make install
 
It could also be interesting to print out the range of terrain height values within the code. If this range looks reasonable, then we may be dealing with a file output issue; but, if the range of computed values in the code also shows an unreasonable range like that found in your static.nc file, then perhaps the issue lies elsewhere.

Adding a line of code like the following just after the terrain field has been computed around line 369 of mpas_init_atm_static.F should work:
Code:
call mpas_log_write('min/max ter = $r / $r', realArgs=[minval(ter(1:nCells)), maxval(ter(1:nCells))])
After recompiling the init_atmosphere core and re-running the static interpolation step, you should see a message like the following in your log file:
Code:
min/max ter = -27.0000 / 5112.49
 
Finally sat down and tried the above suggestions: installed PIO 2.5.9, added the print statement, recompiled the init_atmosphere_model, and it appears to have worked? I got the same output for min/max ter, the timing statistics are now non-zero, and the values from latVertex match the original values from the grid.nc file!

I guess I will need to recompile the atmosphere model as well before I try to do anything more with it, but I've run out of time for working on this today. Will report back later if I'm successful or run into any more problems. Thank you!
 
Finally sat down and tried the above suggestions: installed PIO 2.5.9, added the print statement, recompiled the init_atmosphere_model, and it appears to have worked? I got the same output for min/max ter, the timing statistics are now non-zero, and the values from latVertex match the original values from the grid.nc file!

I guess I will need to recompile the atmosphere model as well before I try to do anything more with it, but I've run out of time for working on this today. Will report back later if I'm successful or run into any more problems. Thank you!
glad you were able to make some progress
 
Can now confirm from plotting the output that it seems to be successful! Now I can try to test out the variable resolution grid I want to use and work on converting my wrf graphics scripts.

Plot from day 7 with a simple 240km uniform mesh grid:

surface_pressure_0_0.png

Thank you both again for your help!
 
Can now confirm from plotting the output that it seems to be successful! Now I can try to test out the variable resolution grid I want to use and work on converting my wrf graphics scripts.

Plot from day 7 with a simple 240km uniform mesh grid:

View attachment 7735

Thank you both again for your help!
Glad it worked out for you.
 
Top