Running Multiple WRF using MPI on 1 Cheyenne Node

pclemins · Oct 4, 2018

We’re running WRF using MPI on 16 processors which is the number we arrived at by using your guidance on how many processors to use given the size of our domain. Each Cheyenne node has 36 cores, so we should be able to run 2 WRF exes on each node, correct? Right now, in our PBS submit script we have the line:

#PBS -l select=1:ncpus=16:mpiprocs=16

And then:

mpiexec_mpt dplace -s 1 ./wrf.exe >& wrf.log

for the exe line.

Is there a quick and easy way to change our submit process so we can cut our CPU hour charges in half and put 2 of our WRF exes on the same node, each using 16 cores for MPI? Is it using a command file with two different wrf.exes in different directories and then increasing the ncpus and mpiproces to 32?

Thanks,
Pat

kwerner · Oct 4, 2018

Hi Pat,
If I understand correctly, you trying to run 2 concurrent wrf.exe jobs at the same time, but wanting to put them on the same batch job? If so, I'm not sure there is a way to do exactly what you’re asking; however, you could run 2 runs at the same time, on 2 different batch jobs in the share queue. The share queue only allows for 18 total processors, and will only charge you for the number that you are using. If you haven’t already seen this page, perhaps it will be helpful?
https://www2.cisl.ucar.edu/resources/computational-systems/cheyenne/running-jobs/job-submission-queues

If this doesn't help, try to contact the CISL support group to see if they can offer a better solution.

kwthomas · Nov 14, 2018

Hi Pat...

What you are looking for can be difficult. Take it from someone with experience doing this with SLURM.

I looked at CISL web pages and found something. It looks like the "mpiexec_mpt" supports an "omplace"
argument in which you can list which cpu's to pin the processes to. The web page says to do a "man omplace"
for info.

Be sure to put the "mpiexec_mpt" commands in the background and do a "wait" after them. It is possible you
may have to do a "sleep 10" between "mpiexec_mpt" commands as race conditions might be possible.

Running Multiple WRF using MPI on 1 Cheyenne Node

pclemins

New member

kwerner

Administrator

kwthomas

New member