I want to run MPAS-A openacc v6 or v7 on our GPU cluster.
It is a HPE Apollo 6500 Linux Cluster with 8 nodes equipped with 8 NVIDIA A100 GPUs per node. It uses NVIDIA/Mellanox HDR100 InfiniBand interconnect. The compute nodes are structured as follows:
We currently have openmpi/4.1.6 compiled with nvhpc/24.7. I know additional libraries would be required for compilation. However, I have a few questions:
1) Is this the latest documentation on gpu-enabled mpas GPU-enabled MPAS-Atmosphere — MPAS Atmosphere documentation
2) What are the differences between the two branches atmosphere/v6.x-openacc and atmosphere/develop-openacc?
3) I also am looking for suggestions of ensuring correct usage of resources allocated in a batch/slurm script once I have the model compiled.
Additionally, I have compiled MPAS v 8.3.1 using nvhpc 25.1 (mpi, compilers, and cuda) and parallel-netcdf. All of these libraries were installed locally in one of my directories, and I don't think my nvhpc-mpi installation accepts srun. I do plan on using the system installation of nvhpc and openmpi used above but have not done so yet.
The executables were linked to cuda libs when compiled and the log file shows that the gpus are being detected, but it does not appear to be using any VRAM during the runs. Are they any suggestions to remedy this?
It is a HPE Apollo 6500 Linux Cluster with 8 nodes equipped with 8 NVIDIA A100 GPUs per node. It uses NVIDIA/Mellanox HDR100 InfiniBand interconnect. The compute nodes are structured as follows:
4 Compute (A100 GPU) | 128 cores (2x64 core 2.00GHz AMD EPYC Milan 7713) | 1 TB (16x 64GB DDR-4 Dual Rank 3200MHz) | 8x NVIDIA A100 (80GB) mig=1 |
2 Compute (A100 GPU) | 128 cores (2x64 core 2.00GHz AMD EPYC Milan 7713) | 1 TB (16x 64GB DDR-4 Dual Rank 3200MHz) | 8x NVIDIA A100 (80GB) mig=2 |
2 Compute (A100 GPU) | 128 cores (2x64 core 2.00GHz AMD EPYC Milan 7713) | 1 TB (16x 64GB DDR-4 Dual Rank 3200MHz) | 8x NVIDIA A100 (80GB) mig=7 |
We currently have openmpi/4.1.6 compiled with nvhpc/24.7. I know additional libraries would be required for compilation. However, I have a few questions:
1) Is this the latest documentation on gpu-enabled mpas GPU-enabled MPAS-Atmosphere — MPAS Atmosphere documentation
2) What are the differences between the two branches atmosphere/v6.x-openacc and atmosphere/develop-openacc?
3) I also am looking for suggestions of ensuring correct usage of resources allocated in a batch/slurm script once I have the model compiled.
Additionally, I have compiled MPAS v 8.3.1 using nvhpc 25.1 (mpi, compilers, and cuda) and parallel-netcdf. All of these libraries were installed locally in one of my directories, and I don't think my nvhpc-mpi installation accepts srun. I do plan on using the system installation of nvhpc and openmpi used above but have not done so yet.
The executables were linked to cuda libs when compiled and the log file shows that the gpus are being detected, but it does not appear to be using any VRAM during the runs. Are they any suggestions to remedy this?