Failure in model integration of GPU-enabled MPAS

louistse0305 · May 26, 2021

Hi everyone,

I compiled the GPU-enabled MPAS following the document in https://mpas-dev.github.io/atmosphere/OpenACC/index.html
using OpenMPI v3.1.3 and PGI Compiler 19.10.

Everything looks fine for compiling, and even running static and init, but when it comes to model integration, segmentation fault happens (see attached files)
View attachment error_messages.txt
View attachment log.atmosphere.role02.0000.out.txt
View attachment log.atmosphere.role01.0000.out.txt

From the log files, it seem that the model crashed in the very beginning just after calling the radiation, I suspect that the tendencies produced by radiation was unable to transfer to GPUs in the first dynamics time step.
Besides, I am using the Supermicro SuperServer 1029GQ-TVRT with 4 Tesla V100
https://www.supermicro.com/products/system/1U/1029/SYS-1029GQ-TVRT.cfm
I am wondering if it is a software problem (say improper installation of MPI or other libraries), or a hardware problem (say two CPUs on the server were unable to communicate with thw 4 GPUs)

Here are the commands for running the case:

Code:

export MPAS_DYNAMICS_RANKS_PER_NODE="24"
export MPAS_RADIATION_RANKS_PER_NODE"16"
gpmetis -minconn -contig -niter=200 480km.graph.info ${MPAS_DYNAMICS_RANKS_PER_NODE}
gpmetis -minconn -contig -niter=200 480km.graph.info ${MPAS_RADIATION_RANKS_PER_NODE}
mpirun -np 40 ./atmosphere_model &

Any comment is welcome, thank you~

Failure in model integration of GPU-enabled MPAS

louistse0305

New member