Failure in model integration of GPU-enabled MPAS

This post was from a previous version of the WRF&MPAS-A Support Forum. Please do not add new replies here and if you would like the thread moved out of the Historical / Archive section then contact us, making sure to include the link of the thread to be moved.

louistse0305

New member
Hi everyone,

I compiled the GPU-enabled MPAS following the document in https://mpas-dev.github.io/atmosphere/OpenACC/index.html
using OpenMPI v3.1.3 and PGI Compiler 19.10.

Everything looks fine for compiling, and even running static and init, but when it comes to model integration, segmentation fault happens (see attached files)
View attachment error_messages.txt
View attachment log.atmosphere.role02.0000.out.txt
View attachment log.atmosphere.role01.0000.out.txt

From the log files, it seem that the model crashed in the very beginning just after calling the radiation, I suspect that the tendencies produced by radiation was unable to transfer to GPUs in the first dynamics time step.
Besides, I am using the Supermicro SuperServer 1029GQ-TVRT with 4 Tesla V100
https://www.supermicro.com/products/system/1U/1029/SYS-1029GQ-TVRT.cfm
I am wondering if it is a software problem (say improper installation of MPI or other libraries), or a hardware problem (say two CPUs on the server were unable to communicate with thw 4 GPUs)

Here are the commands for running the case:
Code:
export MPAS_DYNAMICS_RANKS_PER_NODE="24"
export MPAS_RADIATION_RANKS_PER_NODE"16"
gpmetis -minconn -contig -niter=200 480km.graph.info ${MPAS_DYNAMICS_RANKS_PER_NODE}
gpmetis -minconn -contig -niter=200 480km.graph.info ${MPAS_RADIATION_RANKS_PER_NODE}
mpirun -np 40 ./atmosphere_model &

Any comment is welcome, thank you~
 
Top