Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Successful Setup for MPAS-A

nickheavens-cgg

New member
Dear all,

As requested, I am providing an outline of my successful port of v6.x MPAS-A GPU

CPU: 56xIntel(R) Xeon(R) Gold 6258R CPU @ 2.70 GHz
GPU: 2xQuadro RTX 6000
OS: Debian 10

After substantial modification, I was able to build this with the GNU 12.3 compilers, but it did not compile properly on the GPUs themselves.
I therefore obtained the Portland Group compilers from the Nvidia HPC SDK (23.7-0).
I built separate MPI compilers by installing MPICH 4.1.2.
I needed to set PSM3_HAL=verbs or PSM3_DEVICES=self to prevent the automatic engagement of an interconnect device that does not exist.
I then installed in order:
zlib1.2.13
hdf5-1.14.1-2
pnetcdf-1.12.3
netcdf-c-4.9.2
netcdf-fortran-4.6.1
pio2.6.0.

I compiled using the pgi option in the Makefile but needed to change FFLAGS_ACC
"FFLAGS_ACC = -Mnofma -acc -target=gpu -Minfo=accel" \

I needed to comment out the OpenACC instructions for the ysu2d routine in src/core_atmosphere/physics/physics_wrf/module_bl_ysu.F. Individual loops were causing kernel errors, and I kept getting NaN errors coming out of calculations.

Once this was done, I could run a realistic global simulation at 60 km resolution using:
MPAS_DYNAMICS_RANKS_PER_NODE=12 MPAS_RADIATION_RANKS_PER_NODE=20 MPICH_GPU_SUPPORT_ENABLED=1 mpiexec -np 32 atmosphere_model &> test.out
16 ranks each also works fine.
With just 2 GPU of 24 GB memory, running at higher resolution will overwhelm my GPU memory.
 

mgduda

Administrator
Staff member
Thanks so much for the report, and apologies for the slow followup on our end!

It looks like the default FFLAGS_ACC in the develop-openacc branch has
Code:
FFLAGS_ACC = -Mnofma -acc -ta=tesla:cc70,cc80 -Minfo=accel
while the 'nvhpc' target in the current v8.0.1 release (which doesn't yet support GPU execution outside of the inclusion of what we believe to be the correct compiler flags) has
Code:
FFLAGS_ACC = -Mnofma -acc -gpu=cc70,cc80 -Minfo=accel
Since the 'nvhpc' target's flags are different from what you've had to use ("-Mnofma -acc -target=gpu -Minfo=accel") it would be interesting to know whether "-Mnofma -acc -gpu=cc70,cc80 -Minfo=accel" would also work for your setup.
 

nickheavens-cgg

New member
Thanks so much for the report, and apologies for the slow followup on our end!

It looks like the default FFLAGS_ACC in the develop-openacc branch has
Code:
FFLAGS_ACC = -Mnofma -acc -ta=tesla:cc70,cc80 -Minfo=accel
while the 'nvhpc' target in the current v8.0.1 release (which doesn't yet support GPU execution outside of the inclusion of what we believe to be the correct compiler flags) has
Code:
FFLAGS_ACC = -Mnofma -acc -gpu=cc70,cc80 -Minfo=accel
Since the 'nvhpc' target's flags are different from what you've had to use ("-Mnofma -acc -target=gpu -Minfo=accel") it would be interesting to know whether "-Mnofma -acc -gpu=cc70,cc80 -Minfo=accel" would also work for your setup.
I've been on holiday, so I was unable to try this until this morning. Building the develop-openacc branch with the default FFLAGS_ACC is successful. However, my test case crashes at the first post-initialisation radiation call (30 minutes in). The presenting error is a failure of a pressure column test in mpas_atmphys_interface.F.

Code:
!check that the pressure in the layer above the surface is greater than that in the layer

!above it:

do j = jts,jte

do i = its,ite

if(pres_p(i,1,j) .le. pres_p(i,2,j)) then

call mpas_log_write('')

call mpas_log_write('--- subroutine MPAS_to_phys - pressure(1) < pressure(2):')

call mpas_log_write('i =$i', intArgs=(/i/))

call mpas_log_write('latCell=$r', realArgs=(/latCell(i)/degrad/))

call mpas_log_write('lonCell=$r', realArgs=(/lonCell(i)/degrad/))

do k = kts,kte

call mpas_log_write('$i $i $i $r $r $r $r $r $r $r $r', intArgs=(/j,i,k/),&

realArgs=(/dz_p(i,k,j),pressure_b(k,i),pressure_p(k,i),pres_p(i,k,j), &

 rho_p(i,k,j),th_p(i,k,j),t_p(i,k,j),qv_p(i,k,j)/))

I've attached an example of the log.


1692873017943.png

My best guess is that A100 or V100 specific compilation of the default flags is inappropriate for the RTX6000. Not specifying a compute architecture probably creates a more generalised, less optimised result. Take from that what you will.

Nick
 

gdicker

New member
Staff member
@nickheavens-cgg, thanks from me as well for trying this. Especially for documenting this on the forum.

The error you noted is something myself and another colleague are currently working on. It seems that a pressure variable is getting incorrect (zero) values, causing other calculations to go off-track, and eventually a segfault. Sadly no solution at this moment, but I will update when there is.
 
Top