Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Maximal speed of run

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.


New member
Hi everyone
I'm very noob in WRF and i want to make a question for get the best perfomance in my WRF runs. I compiled with DMPAR option and i want to run a real case with 46 x 46 cells (a little mesh), with 5 nest in similar sizes. i have a HPC with 96 cores (AMD) and i run the case with parallelization mpich using 16 cores (4X, 4Y), using mpirun -np 16 ./wrf.exe, obviously with that i get the best division (near 10 cells per processors), so with mpich i can't get a fastest runs.

Do you know how can i get a shorter time the runs, with the same size of domains?

PD1: The objective it's get a time series of wind in one point for a very long time, so the time of runs it's important.

PD2: i already tried use a WRF compilated DMPAR+SMPAR, to use more processors in the run with mpich more some preocessors with OPENMP (TILES), but i don't get a better performance.

I will be very greatfull for any informations
Please take a look at this FAQ that discusses choosing a good number of processors for your domain set-up.

I also would like to point out that your domain size is too small for any real application. We recommend at least 100x100 per domain, and that's on the smaller end. Take a look at this web page for best practice recommendations for domain set-up.

Once you get your domain set-up reasonably, then try to figure out the best number of processors for your simulation. The time each simulation takes can be dependent upon a lot of things - such as the time_step you're using, the physics and dynamics options, whether you have anything else running on the system in the background, etc. We typically recommend a dmpar compile for best performance.
Thanks for your reply and corrections really helps, but if i understand, the best performance in the model run it gets when you have the relation of 10 cells or elements in the domain by 1 processors?. By the other hand, there are any other recomendation in the use of time steps and physics and dynamics options or another feature for accelerate the model runs?

I'm not sure whether it's accurate to say that you would get the best performance when you have 10 grid cells per direction for each processor. That is the minimum number you can have in each direction, per processor, and since that would be the max number of processors you can use, then perhaps it would give the best performance, but I can't say that with 100% certainty.

As for time_step options, you cannot have a time_step greater than 6xDX, so if you're able to run with that setting (i.e., it's not large and giving CFL errors), then you should choose that value. Physics and dynamics options should be dependent upon your specific application and what you are interested in, and not based on which ones run the fastest. While a quick simulation is nice, accurate results are more important.