Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Failure in model integration of GPU-enabled MPAS

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

louistse0305

New member
Hi everyone,

I compiled the GPU-enabled MPAS following the document in https://mpas-dev.github.io/atmosphere/OpenACC/index.html
using OpenMPI v3.1.3 and PGI Compiler 19.10.

Everything looks fine for compiling, and even running static and init, but when it comes to model integration, segmentation fault happens (see attached files)
View attachment error_messages.txt
View attachment log.atmosphere.role02.0000.out.txt
View attachment log.atmosphere.role01.0000.out.txt

From the log files, it seem that the model crashed in the very beginning just after calling the radiation, I suspect that the tendencies produced by radiation was unable to transfer to GPUs in the first dynamics time step.
Besides, I am using the Supermicro SuperServer 1029GQ-TVRT with 4 Tesla V100
https://www.supermicro.com/products/system/1U/1029/SYS-1029GQ-TVRT.cfm
I am wondering if it is a software problem (say improper installation of MPI or other libraries), or a hardware problem (say two CPUs on the server were unable to communicate with thw 4 GPUs)

Here are the commands for running the case:
Code:
export MPAS_DYNAMICS_RANKS_PER_NODE="24"
export MPAS_RADIATION_RANKS_PER_NODE"16"
gpmetis -minconn -contig -niter=200 480km.graph.info ${MPAS_DYNAMICS_RANKS_PER_NODE}
gpmetis -minconn -contig -niter=200 480km.graph.info ${MPAS_RADIATION_RANKS_PER_NODE}
mpirun -np 40 ./atmosphere_model &

Any comment is welcome, thank you~
 
Top