Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Running Multiple WRF using MPI on 1 Cheyenne Node

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

pclemins

New member
We’re running WRF using MPI on 16 processors which is the number we arrived at by using your guidance on how many processors to use given the size of our domain. Each Cheyenne node has 36 cores, so we should be able to run 2 WRF exes on each node, correct? Right now, in our PBS submit script we have the line:

#PBS -l select=1:ncpus=16:mpiprocs=16

And then:

mpiexec_mpt dplace -s 1 ./wrf.exe >& wrf.log

for the exe line.

Is there a quick and easy way to change our submit process so we can cut our CPU hour charges in half and put 2 of our WRF exes on the same node, each using 16 cores for MPI? Is it using a command file with two different wrf.exes in different directories and then increasing the ncpus and mpiproces to 32?

Thanks,
Pat
 
Hi Pat,
If I understand correctly, you trying to run 2 concurrent wrf.exe jobs at the same time, but wanting to put them on the same batch job? If so, I'm not sure there is a way to do exactly what you’re asking; however, you could run 2 runs at the same time, on 2 different batch jobs in the share queue. The share queue only allows for 18 total processors, and will only charge you for the number that you are using. If you haven’t already seen this page, perhaps it will be helpful?
https://www2.cisl.ucar.edu/resources/computational-systems/cheyenne/running-jobs/job-submission-queues

If this doesn't help, try to contact the CISL support group to see if they can offer a better solution.
 
Hi Pat...

What you are looking for can be difficult. Take it from someone with experience doing this with SLURM.

I looked at CISL web pages and found something. It looks like the "mpiexec_mpt" supports an "omplace"
argument in which you can list which cpu's to pin the processes to. The web page says to do a "man omplace"
for info.

Be sure to put the "mpiexec_mpt" commands in the background and do a "wait" after them. It is possible you
may have to do a "sleep 10" between "mpiexec_mpt" commands as race conditions might be possible.
 
Top