Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Small 1-domain WRF run on Derecho


New member

I am setting up a small 1-domain WRF run on Derecho. The domain size is 121x121 (x51 levels). I compiled WRF with dm+sm and with dm only. Since Derecho's nodes are 128 cores and given the domain size, I can only use one node.

I tried the following:
1) To run with the dm version only, but I get the error:

e_we = 121, nproc_x = 8, with cell width in x-direction = 15
e_sn = 121, nproc_y = 16, with cell width in y-direction = 7
--- ERROR: Reduce the MPI rank count, or redistribute the tasks.
-------------- FATAL CALLED ---------------
NOTE: 1 namelist settings are wrong. Please check and reset these options

Can dm only compilation be used at all with such a small domain? I tried setting nproc_x and nproc_y in the namelist but I will get the same error.

2) To run with the dm+sm version and submit the job with:
#PBS -l select=1:ncpus=128:mpiprocs=16: ompthreads=8
time mpiexec --cpu-bind depth -n 16 -ppn 16 -d 8 ./wrf.exe
(where 16x8) seems to be the best mpi/omp decomposition for this size, then a 60h long simulation takes about ~100minutes (which seems a bit long).

Is there anything that I am missing here, that I could do to speed things up, or this is the limit?

Thank you very much for your help.
I have run WRF in derecho with a smaller number of processors. Note that I built WRF in dmpar mode, and thus this answer only addresses your question (1). Please try the options with a smaller number of processors, for example,

#PBS -l select=1:ncpus=16:mpiprocs=16
mpiexec -n 16 -ppn 16 ./wrf.exe

Let me know whether this works for you.

A similar strategy may be applied to the code built in dm+sm mode. However, I didn't test this option myself. Probably you can try and I would appreciate if you can keep me updated about the result. Thanks in advance.
Thank you for your reply. Actually you are right, I was able to use up to 121 cores on 1 node as:
#PBS -l select=1:ncpus=121:mpiprocs=121
Nevertheless, this is still quite much slower than using the dm+sm WRF version with
#PBS -l select=1:ncpus=128:mpiprocs=16: ompthreads=8

I do get confused with the namelist option nproc_x, nproc_y and on how to use them. I was hoping perhaps that could have helped or you all had suggestion on how to use them for this case. Also, can these options be used with the dm+sm version?
Thank you very much for your help.
Thank you for the update.

Since WRF domain is automatically decomposed , usually it is not necessary to specify nproc_x and nproc_y.

However, if you do want to specify the above namelist options, let's suppose you run with 32 processors, you can set

nproc_x = 4
nproc_y = 8

In this case you will have 4 processors along the x-direction and 8 processors along the y-direction.