Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Segmentation Fault when Running BCs for MPAS

aroseman

New member
I keep getting a strange segmentation fault when running BCs on MPAS V8.3.1. I have tried increasing the amount of mpiprocceses (as seened in the attached .sh file) as well as undersubscribing, but it still happens. It also occurs at a random spot in the run, for example, in the log.init_atmosphere.0000.out, sometimes it stops in the attached example, or is it able to output a few LBC files before crashing.

Note: I am using a regional 60-3 km mesh and the outputted .graph.part. file which I applied "gpmetis -minconn -contig -niter=200 ${name}.graph.info 256" to get a .graph.part.256 file. I then used the new mesh_scaling tool "scale_region" to scale by 3 to a 20-1 km mesh.


Checking Memory:
qhist -j 3277247
==>

Job ID User Queue Nodes NCPUs NGPUs End Mem CPU Elap
------------ ---------- -------- ----- ------ ----- ------- -------- ------ ------
3277247 aroseman cpu 4 512 0 01-1644 66.87 37.37 0.01

What should I try to figure out the cause and avoid the error?
 

Attachments

  • submit_mpas_256_BCs.sh.txt
    754 bytes · Views: 3
  • log.init_atmosphere.0000_BC_36.out.txt
    648 bytes · Views: 3
  • mpas_8.3.1_BCs.e3276877.txt
    4.3 KB · Views: 3
Last edited:
Although I don't have any good ideas as to why the init_atmosphere_model program may be randomly stopping with a segmentation fault, I did notice that you appear to be using a rather old compiler from the intel/2023.0.0 module on Derecho. It could be worth trying with either the intel/2025.1.0 module or with the gcc/12.4.0 module; perhaps best would be to try using the same modules described in the "0. Prerequisites and environment setup" section of the most recent MPAS-A tutorial practice guide:
Code:
module --force purge
module load ncarenv/24.12
module load craype/2.7.31
module load gcc/12.4.0
module load ncarcompilers/1.0.0
module load cray-mpich/8.1.29
module load parallel-netcdf/1.14.0
 
Thanks! I was using the 2024 one before. I will try redoing from scratch with the 2025 workshop. I will use the intel option, since I am used to that. If not I will try compiling with gcc next.

This is very strange though, since this issue has not occurred before, using the same exact setup, though maybe its something small that didn't occur previously.
 
Last edited:
Hi Michael,

I ran with the newest modules, and using intel, and was able to get through Static, ICs, and BCs!

module --force purge
module load ncarenv/24.12
module load craype/2.7.31
module load intel/2025.1.0
module load ncarcompilers/1.0.0
module load cray-mpich/8.1.29
module load parallel-netcdf/1.14.0
module load netcdf/4.9.2

Thanks for the help!
 
Last edited:
However, unfortunately, after running the model with atmosphere_model, a segmentation fault occurs shortly after:

PBS Job Id: 3279777.desched1
Job Name: mpas_8.3.1_Run
Execution terminated
Exit_status=174
resources_used.cpupercent=18438
resources_used.cput=00:48:21
resources_used.mem=121042440kb
resources_used.ncpus=512
resources_used.vmem=60053264kb
resources_used.walltime=00:00:16

I also noticed something quite strange when using ncvis on the initial condition ==> there seems to be a region with NaN surface pressure in the center, it doesn't occur for every variable, but does for others like rho and relative humidity and occurs throughout multiple vertical levels.

Note: I have used this data before without issue.
1.png
 

Attachments

  • model_directory.txt
    5.1 KB · Views: 1
  • log.atmosphere.0000.out.txt
    1.3 KB · Views: 1
  • mpas_8.3.1_Run.e3279777_mpi256.txt
    1.8 KB · Views: 1
  • submit_mpas_256_Run.sh.txt
    865 bytes · Views: 1
  • grid.png
    grid.png
    247.3 KB · Views: 1
Last edited:
Yes, will do that next.

Though I did find another possibility:

I am using the "scale_region.py" tool on Meshes & Mesh Utilities — MPAS Atmosphere documentation. I scaled the 60-3 km regional mesh to a 20-1 km mesh. (Note: I applied the tool on grid.nc, not static.nc) I just ran the Static and ICs step again with the non-scaled 60-3 km region mesh and found no such missing points in the center (1km region) extending up in the vertical (as I did in the 20-1km case).
==> It seems there is some issue with the scaling.
 

Attachments

  • 20-1km_GRID.png
    20-1km_GRID.png
    247.3 KB · Views: 1
  • 20-1km_grid_NCVIS.png
    20-1km_grid_NCVIS.png
    210.5 KB · Views: 1
  • 60-3km_GRID.png
    60-3km_GRID.png
    255.9 KB · Views: 1
  • 60-3km_NCVIS.png
    60-3km_NCVIS.png
    143.9 KB · Views: 1
Last edited:
Here is the output logs for the compile with DEBUG=true
 

Attachments

  • init_atmosphere_compile.log
    98.2 KB · Views: 0
  • atmosphere_compile.log
    451.2 KB · Views: 0
Top