Failure in model integration of GPU-enabled MPAS

Questions about and discussion of the GPU-enabled MPAS-Atmosphere branch.
Post Reply
louistse0305
Posts: 2
Joined: Tue May 04, 2021 2:13 am

Failure in model integration of GPU-enabled MPAS

Post by louistse0305 » Wed May 26, 2021 9:13 am

Hi everyone,

I compiled the GPU-enabled MPAS following the document in https://mpas-dev.github.io/atmosphere/O ... index.html
using OpenMPI v3.1.3 and PGI Compiler 19.10.

Everything looks fine for compiling, and even running static and init, but when it comes to model integration, segmentation fault happens (see attached files)
error_messages.txt
(43.11 KiB) Downloaded 19 times
log.atmosphere.role02.0000.out.txt
(11.02 KiB) Downloaded 22 times
log.atmosphere.role01.0000.out.txt
(9.13 KiB) Downloaded 19 times
From the log files, it seem that the model crashed in the very beginning just after calling the radiation, I suspect that the tendencies produced by radiation was unable to transfer to GPUs in the first dynamics time step.
Besides, I am using the Supermicro SuperServer 1029GQ-TVRT with 4 Tesla V100
https://www.supermicro.com/products/sys ... Q-TVRT.cfm
I am wondering if it is a software problem (say improper installation of MPI or other libraries), or a hardware problem (say two CPUs on the server were unable to communicate with thw 4 GPUs)

Here are the commands for running the case:

Code: Select all

export MPAS_DYNAMICS_RANKS_PER_NODE="24"
export MPAS_RADIATION_RANKS_PER_NODE"16"
gpmetis -minconn -contig -niter=200 480km.graph.info ${MPAS_DYNAMICS_RANKS_PER_NODE}
gpmetis -minconn -contig -niter=200 480km.graph.info ${MPAS_RADIATION_RANKS_PER_NODE}
mpirun -np 40 ./atmosphere_model &
Any comment is welcome, thank you~

Post Reply

Return to “GPU / OpenACC”