Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

WRF Parallel Processing on AWS PCluster

Jackson_yip

New member
Hi All,

I've been running into an issue trying to configure a WRF implementation (dmpar configuration) to run in parallel across the 6 nodes I have specified in my AWS Parallel Cluster (GFortran). As it stands, I have loaded all appropriate dependencies using spack and have loaded all required packages to run and I have WPS operating correctly as well as real.exe. However, when I pass the job submission script, I end up watching only a single node process the model, showing the time stamps return but no other nodes move past the initial 'W-damping begins at W-courant number...' read out in the rsl.out files. My questions are three fold:

- How do I configure WRF such that it tiles properly to run my domain in parallel?
- It seems like it has to do both with the configuration when installed (or configured and compiled) as well as the &domain flags in namelist.input (attached), is that correct or am I missing something?
- How can I better identify from rsl.out or rsl.error files whether the tiling is working properly in a parallel processing mode as WRF is running?

Any help is greatly appreciated,
J
 

Attachments

  • namelist.input.pdf
    20.1 KB · Views: 2
Hi,
If you have built WRF for parallel computing (e.g., dmpar), then that, plus the batch script you use to submit the run are all that should be required to run in parallel. Are you running real.exe in parallel, with multiple processors?

The &domains section of the namelist simply defines the domain on the globe. There are no settings in that section that determine a parallel vs. serial simulation.

If you have successfully built wrf with the dmpar option, you should receive one rsl.out and one rsl.error file per processor. This means that if you are requesting 6 processors, you should have a total of 12 rsl files, and at the top of each one, there should be a line about the number of processors in the x and y directions (multiplying those two numbers should =6).

In the past, when running in parallel on AWS, I've used batch scripts like this (e.g., runwrf.sh):

Code:
#!/bin/bash -l
 
#SBATCH --job-name=WRF_test
#SBATCH --nodes=15
#SBATCH --ntasks=540
#SBATCH --ntasks-per-node=36
#SBATCH --cpus-per-task=1
#SBATCH --exclusive

ulimit -s unlimited
export I_MPI_DEBUG=5

mpirun -np 540 -ppn 36 ./wrf.exe

That used 15 nodes, with a total of 540 processors. To submit that, I used
Code:
sbatch runwrf.sh
 
Top