About roles and ranks

Questions about and discussion of the GPU-enabled MPAS-Atmosphere branch.
Post Reply
afernandezody
Posts: 29
Joined: Sat Mar 07, 2020 12:00 am

About roles and ranks

Post by afernandezody » Tue Nov 17, 2020 5:56 pm

Hello,
I just compiled the app and have started troubleshooting a few things. My first question is a simple one. When I use 1 rank, the message reads:

Code: Select all

 My role is             3
 Role leader is             0
A bit odd but it creates the log file with 'role03' inserted into the file name. It never uses the GPU and eventually crashes (maybe because of insufficient GPU memory). The next step is to use 2 GPUs but this is more suspicious as it's producing:

Code: Select all

 My role is             3
 Role leader is             0
 My role is             3
 Role leader is             0
and there is a single output file rather than 2. Obviously, there's something iffy so any pointers would be welcome. Thanks.

mgduda
Posts: 494
Joined: Mon Feb 26, 2018 7:35 pm

Re: About roles and ranks

Post by mgduda » Tue Dec 08, 2020 5:35 pm

Have you set the environment variables MPAS_DYNAMICS_RANKS_PER_NODE and MPAS_RADIATION_RANKS_PER_NODE as described in the documentation? The GPU-enabled model will probably require at least 4 MPI ranks -- two ranks to run the radiation on CPUs and two ranks to run the rest of the model on GPUs -- since two CPU sockets are assumed, and the code tries to distribute ranks equally between sockets.
NCAR/MMM

afernandezody
Posts: 29
Joined: Sat Mar 07, 2020 12:00 am

Re: About roles and ranks

Post by afernandezody » Wed Dec 09, 2020 11:13 pm

Thanks.
Maybe I didn't interpret/understand the instructions correctly (some doubts crept up at the time). Just for clarification, I tried to run 2 MPI ranks and my system is very different from Summit. I was testing on a single node with 4 CPUs and 2 GPUs. I have 3 qs:
1) Would my configuration (single node with 2+ CPUs and 2 GPUs) be feasible?
2) Are two nodes required to have 2 CPU sockets? (Honestly, I'm unsure if we're using the word 'socket' differently because of the Summit architecture)
3) If I wanted to run on 2 CPUs + 2 GPUs, and assuming that MPAS_DYNAMICS_RANKS_PER_NODE & MPAS_RADIATION_RANKS_PER_NODE are set to 2, would I have to call 'mpirun -np 2' or 'mpirun -np 4''?

afernandezody
Posts: 29
Joined: Sat Mar 07, 2020 12:00 am

Re: About roles and ranks

Post by afernandezody » Mon Dec 14, 2020 11:23 pm

I did a last attempt to run in 2 nodes rather than in 1. Although it didn't crash (no error message), the app never engaged the GPUs and, as far as I can tell, didn't advance even though it ran for a while. It's probably better to wait until v7 is ready and perform a full testing then.

Post Reply

Return to “GPU / OpenACC”