Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Compiling and Running MPAS on GPU cluster

AKnight

New member
I want to run MPAS-A openacc v6 or v7 on our GPU cluster.

It is a HPE Apollo 6500 Linux Cluster with 8 nodes equipped with 8 NVIDIA A100 GPUs per node. It uses NVIDIA/Mellanox HDR100 InfiniBand interconnect. The compute nodes are structured as follows:
4 Compute
(A100 GPU)
128 cores
(2x64 core 2.00GHz AMD EPYC Milan 7713)
1 TB
(16x 64GB DDR-4 Dual Rank 3200MHz)
8x NVIDIA A100 (80GB) mig=1
2 Compute
(A100 GPU)
128 cores
(2x64 core 2.00GHz AMD EPYC Milan 7713)
1 TB
(16x 64GB DDR-4 Dual Rank 3200MHz)
8x NVIDIA A100 (80GB) mig=2
2 Compute
(A100 GPU)
128 cores
(2x64 core 2.00GHz AMD EPYC Milan 7713)
1 TB
(16x 64GB DDR-4 Dual Rank 3200MHz)
8x NVIDIA A100 (80GB) mig=7

We currently have openmpi/4.1.6 compiled with nvhpc/24.7. I know additional libraries would be required for compilation. However, I have a few questions:
1) Is this the latest documentation on gpu-enabled mpas GPU-enabled MPAS-Atmosphere — MPAS Atmosphere documentation
2) What are the differences between the two branches atmosphere/v6.x-openacc and atmosphere/develop-openacc?
3) I also am looking for suggestions of ensuring correct usage of resources allocated in a batch/slurm script once I have the model compiled.

Additionally, I have compiled MPAS v 8.3.1 using nvhpc 25.1 (mpi, compilers, and cuda) and parallel-netcdf. All of these libraries were installed locally in one of my directories, and I don't think my nvhpc-mpi installation accepts srun. I do plan on using the system installation of nvhpc and openmpi used above but have not done so yet.
The executables were linked to cuda libs when compiled and the log file shows that the gpus are being detected, but it does not appear to be using any VRAM during the runs. Are they any suggestions to remedy this?
 
Hi there,

While I do not have familiarity with the previous GPU-ports of MPAS - another staff member can perhaps help you better - I can try to help with getting v8.3.1 to run on GPUs. Could you please share the following details:
  1. The full list of modules loaded at build-time, and your build command.
  2. The section of the Makefile specific to your machine ( Showing the build options)
  3. The model run log files + your command to launch the MPAS model on GPUs
  4. Also, how do you check the GPU/device memory during the model runs?
Thanks!
 
The full list of modules loaded at build-time, and your build command.
1. I used the module file that comes with the nvhpc 25.1 installation. (Attached to this post). Additionally, I installed pnetcdf/1.14.1 and loaded it with the other module file attached.
The section of the Makefile specific to your machine ( Showing the build options)
2. I may be misunderstanding the question, but I used the "nvhpc" build target. (make -j 4 CORE=atmosphere OPENACC=true)
The model run log files + your command to launch the MPAS model on GPUs
3. I am attaching my submission script and the log files. The I have struggle with understand some of the documentation, so don't judge the likely pitiful attempt at running it. I have successfully run multiple larger cpu runs, but this is completely new territory.
Also, how do you check the GPU/device memory during the model runs?
4. For starters, I kept running into the issue of the multiple processes being assigned to 1 gpu, so I couldn't even get past that issue. However, I used nvidia-smi and nvidia-smi --query-gpu=utilization.gpu,utilization.memory.

Additionally, when I attached the output of "ldd atmoshpere_model" in the ldd.txt file
 

Attachments

  • 1.14.1.txt
    1.1 KB · Views: 1
  • 25.1.txt
    2.1 KB · Views: 1
  • ldd.txt
    3 KB · Views: 1
  • log.atmosphere.0000.txt
    153.2 KB · Views: 1
  • run_model.txt
    995 bytes · Views: 1
Thanks for the context! So a few things..

1. The environment variables MPAS_DYNAMICS_RANKS_PER_NODE and MPAS_RADIATION_RANKS_PER_NODE are not used in the ongoing GPU port of MPAS v8. Which also brings us to another important point, the version 8 GPU port is ongoing and only the dynamical core is currently ported. If you notice that v8 runs much slower on GPUs than it does on CPUs, it is expected.

2. I think your build command seems reasonable, but let's work on the command for model run. To ensure that each MPI rank gets assigned to a dedicated device, we use the CUDA_VISIBLE_DEVICES env variables as you know. However, it needs to be offloaded to a separate script so that each MPI task uses a different value of CUDA_VISIBLE_DEVICES.

For example to run the model on 4 MPI ranks with each rank using a GPU

mpirun -np 4 ./set_gpu_rank.sh ./atmosphere_model

where set_gpu_rank.sh contains

export LOCAL_RANK=$SLURM_LOCALID
export GPUS=(0 1 2 3)
export CUDA_VISIBLE_DEVICES=${GPUS[$SLURM_LOCALID]}

Let me know if this fixes some of your issues.

3. Regarding nvidia-smi, if you launch the job and login to the node and then do nvidia-smi to look at the visual output, it can sometimes show no GPU usage due to a lower update frequency. However, you might see the GPU being used if you're saving the log to a csv file for example. If you still don't, try adjusting the query interval to a few milliseconds.

Alternatively, you can also use the Nvidia Nsight tools to confirm that it's running on the GPUs.
 
Top