nickheavens-cgg
New member
Dear all,
As requested, I am providing an outline of my successful port of v6.x MPAS-A GPU
CPU: 56xIntel(R) Xeon(R) Gold 6258R CPU @ 2.70 GHz
GPU: 2xQuadro RTX 6000
OS: Debian 10
After substantial modification, I was able to build this with the GNU 12.3 compilers, but it did not compile properly on the GPUs themselves.
I therefore obtained the Portland Group compilers from the Nvidia HPC SDK (23.7-0).
I built separate MPI compilers by installing MPICH 4.1.2.
I needed to set PSM3_HAL=verbs or PSM3_DEVICES=self to prevent the automatic engagement of an interconnect device that does not exist.
I then installed in order:
zlib1.2.13
hdf5-1.14.1-2
pnetcdf-1.12.3
netcdf-c-4.9.2
netcdf-fortran-4.6.1
pio2.6.0.
I compiled using the pgi option in the Makefile but needed to change FFLAGS_ACC
"FFLAGS_ACC = -Mnofma -acc -target=gpu -Minfo=accel" \
I needed to comment out the OpenACC instructions for the ysu2d routine in src/core_atmosphere/physics/physics_wrf/module_bl_ysu.F. Individual loops were causing kernel errors, and I kept getting NaN errors coming out of calculations.
Once this was done, I could run a realistic global simulation at 60 km resolution using:
MPAS_DYNAMICS_RANKS_PER_NODE=12 MPAS_RADIATION_RANKS_PER_NODE=20 MPICH_GPU_SUPPORT_ENABLED=1 mpiexec -np 32 atmosphere_model &> test.out
16 ranks each also works fine.
With just 2 GPU of 24 GB memory, running at higher resolution will overwhelm my GPU memory.
As requested, I am providing an outline of my successful port of v6.x MPAS-A GPU
CPU: 56xIntel(R) Xeon(R) Gold 6258R CPU @ 2.70 GHz
GPU: 2xQuadro RTX 6000
OS: Debian 10
After substantial modification, I was able to build this with the GNU 12.3 compilers, but it did not compile properly on the GPUs themselves.
I therefore obtained the Portland Group compilers from the Nvidia HPC SDK (23.7-0).
I built separate MPI compilers by installing MPICH 4.1.2.
I needed to set PSM3_HAL=verbs or PSM3_DEVICES=self to prevent the automatic engagement of an interconnect device that does not exist.
I then installed in order:
zlib1.2.13
hdf5-1.14.1-2
pnetcdf-1.12.3
netcdf-c-4.9.2
netcdf-fortran-4.6.1
pio2.6.0.
I compiled using the pgi option in the Makefile but needed to change FFLAGS_ACC
"FFLAGS_ACC = -Mnofma -acc -target=gpu -Minfo=accel" \
I needed to comment out the OpenACC instructions for the ysu2d routine in src/core_atmosphere/physics/physics_wrf/module_bl_ysu.F. Individual loops were causing kernel errors, and I kept getting NaN errors coming out of calculations.
Once this was done, I could run a realistic global simulation at 60 km resolution using:
MPAS_DYNAMICS_RANKS_PER_NODE=12 MPAS_RADIATION_RANKS_PER_NODE=20 MPICH_GPU_SUPPORT_ENABLED=1 mpiexec -np 32 atmosphere_model &> test.out
16 ranks each also works fine.
With just 2 GPU of 24 GB memory, running at higher resolution will overwhelm my GPU memory.