Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

MPAS-8.2 crashes with "convection_permitting" physics suite..

g.bnz

New member
Observations:
- no log*.err is generated
- works with the "mesoscale_reference" suite
- both suites are working with MPAS-8.1

The error:
mpiexec -n 1 --allow-run-as-root atmosphere_model
free(): invalid pointer

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
#0 0x7fe8228a48c2 in ???
#1 0x7fe8228a3a55 in ???
#2 0x7fe82249c04f in ???
#3 0x7fe8224eae2c in ???
#4 0x7fe82249bfb1 in raise
#5 0x7fe822486471 in abort
#6 0x7fe8224df42f in ???
#7 0x7fe8224f47a9 in ???
#8 0x7fe8224f6533 in ???
#9 0x7fe8224f8e8e in cfree
#10 0x561f7d99c52c in __bl_mynn_MOD_bl_mynn_run
#11 0x561f7d9c7217 in __module_bl_mynn_MOD_mynn_bl_driver
#12 0x561f7d88748f in __mpas_atmphys_driver_pbl_MOD_driver_pbl
#13 0x561f7d837cec in __mpas_atmphys_driver_MOD_physics_driver._omp_fn.8
#14 0x7fe82267dd8d in ???
#15 0x7fe8224e9133 in ???
#16 0x7fe822568a3f in __clone
#17 0xffffffffffffffff in ???
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 0 on node mpas-8-2-c01 exited on signal 6 (Aborted).
 

Attachments

  • log.atmosphere.0000.out.txt
    17.8 KB · Views: 1
  • namelist.atmosphere.txt
    2 KB · Views: 1
Last edited:
Thanks for the error report. Could you please try another run after compiling with `DEBUG=true` and share that backtrace?
 
Is it the one?
╰─➤ mpiexec -n 1 --allow-run-as-root atmosphere_model
At line 464 of file bl_mynn.F90
Fortran runtime error: Index '668' of dimension 1 of array 'maxwidth' above upper bound of 667

Error termination. Backtrace:
#0 0x7fd3554a28c2 in ???
#1 0x7fd3554a33b9 in ???
#2 0x7fd3554a3949 in ???
#3 0x56527aeed854 in __bl_mynn_MOD_bl_mynn_run
at /workspaces/mpas_container/src/MPAS-Model/src/core_atmosphere/physics/physics_mmm/bl_mynn.F90:464
#4 0x56527afc831a in __module_bl_mynn_MOD_mynn_bl_driver
at /workspaces/mpas_container/src/MPAS-Model/src/core_atmosphere/physics/physics_wrf/module_bl_mynn.F:593
#5 0x56527ac54bbe in __mpas_atmphys_driver_pbl_MOD_driver_pbl
at /workspaces/mpas_container/src/MPAS-Model/src/core_atmosphere/physics/mpas_atmphys_driver_pbl.F:959
#6 0x56527abcec19 in __mpas_atmphys_driver_MOD_physics_driver._omp_fn.8
at /workspaces/mpas_container/src/MPAS-Model/src/core_atmosphere/physics/mpas_atmphys_driver.F:315
#7 0x7fd355374d8d in ???
#8 0x7fd3551e0133 in ???
#9 0x7fd35525fa3f in ???
#10 0xffffffffffffffff in ???
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

Process name: [[13631,1],0]
Exit code: 2
 
I don't think so, but it could be worth trying with different ranks.

It might also be worth trying this setup without OpenMP.
 
Removing OpenMP does the trick.
MPAS v8.2.1

Also it looks like integration time per iteration drops almost twice. I observe same cpu utilization and the number of threads were equal to hw threads avail is system.

Time to try openACC..
 
That's some good news!

Just to note that we're in the early stage of the OpenACC port, you will likely not see better performance yet.
 
Top