Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

MPAS limited area 3x slower than WRF regional run

davidovens

New member
I have set up an MPAS 8.3.1 4-km limited area run with 38 vertical levels and a 21000m top using the convection_permitting physics suite and am comparing this to a WRF 36/12/4-km run with a nearly identical boundary for the 4-km run (I used 20 points from the WRF boundaries to cut out the area from the MPAS 4-km quasi-uniform mesh x1.36864002.static.nc). I compiled MPAS on our Linux cluster using Intel 19.1.3.304 (2020) and OpenMPI 5.0.3 and used "-O3 -xHost" optimizations, just like I do for WRF. My WRF 4.1.3 runs are compiled and run with older Intel and OpenMPI but the 24-s timesteps for the 4-km domain are taking 0.55 wallclock seconds for WRF but are taking 1.58 wallclock seconds for MPAS. Has anyone else tried to compare WRF and MPAS timing and if so, what kind of timing results did you see? I have included log.atmosphere.0000.out and rsl.out.0000 extracts for comparison.
 

Attachments

  • log.atmosphere.0000.out.txt
    20.7 KB · Views: 2
  • rsl.out.0000.txt
    27.9 KB · Views: 2
Last edited:
Can you tell me the number of meshes in your regional MPAS domain? It seems that you have 38 vertical levels. Please confirm this is correct.

How many grids do you have in your WRF domain? Did you also run with 38 vertical levels?

Please upload your namelist.input (for WRF) and namelist.atmosphere (For MPAS) for me to take a look. I also need to know how many processors did you use to run the two models.

Thanks.
 
There are 150,412 cells in MPAS; WRF has 405x282= 114,210 grids. MPAS has 38 levels; WRF has bottom_top=37 (bottom_top_stag = 38). I used all 32 processors on a 32-cpu machine. Here are the namelists.
 

Attachments

  • namelist.atmosphere.txt
    2 KB · Views: 4
  • namelist.input.txt
    16.2 KB · Views: 3
Thank you for uploading the namelist files. For a more accurate comparison of computation cost by different models, the physics and dynamics options should be as close as possible. In your MPAS and WRF runs, microphysics, radiation intervals, land surface module, and some dynamics options are different between the two models. I believe these options affect the efficiency of model integration and lead to the differences you have seen in your computation cost.

The compiler used to build the model executables, as well as the optimization flags used for that compiler may also affect computational cost.

Hope this is helpful for you. Please let me know if you have more questions regarding this issue.
 
Looking at the User Guide description of the physics packages in the 'convection_permitting' physics suite, I ran a new test matching the
physics between WRF and MPAS with these new attached namelist.input and namelist.atmosphere files. Here are the key settings:

WRF MPAS
--- ----
Grell-Frietas(cu_physics=3) cu_grell_freitas
Thompson(mp_physics=8) Thompson (non-aerosol aware)
Noah(sf_surface_physics=2) Noah
MYNN(bl_pbl_physics=5) MYNN
MYNN(sf_sfclay_physics=5) MYNN
RRTMG(ra_lw_physics=4) RRTMG
RRTMG(ra_sw_physics=4) RRTMG
Xu-Randall(icloud=1) Xu-Randall
radt=15,15,15 config_radtlw_interval = '00:15:00'
gwd_opt=0 config_gwdo_scheme='off'

I also compiled the WRF 4.1.3 with the exact same Intel compilers and OpenMPI version as I used for MPAS 8.3.1 and ran the WRF and MPAS tests on the same machines. MPAS grid generation (using the limited area tool on a quasi-uniform 4-km global mesh) is responsible for creating 150,412 cells vs WRF's 114,210 (405x282) grids, so that is about 32% more cells. As you can see from the attached
log.atmosphere.0000.out and rsl.out.0000 files, typical time-step integrations are:
MPAS:
Begin timestep 2025-09-21_00:02:00
Timing for integration step: 1.50829 s
WRF:
Timing for main: time 2025-09-21_00:02:00 on domain 3: 0.54765 elapsed seconds

That means MPAS is 2.75 times slower than WRF.
If we account for the cell discrepancy (150412/114210), then MPAS is still 2.09 times slower
than WRF.

Another way to compare these is just to compare the how long it takes to run 6 hours. The WRF is doing a 32/12/4-km run and it completes it
in 687 seconds, while MPAS takes 1812 seconds: 2.64 times as long, or 2.0 times as long when accounting for the cell discrepancy.

My conclusion is that MPAS is 2x slower than WRF.
 

Attachments

  • namelist.input.txt
    15 KB · Views: 1
  • namelist.atmosphere.txt
    2 KB · Views: 2
  • log.atmosphere.0000.out.txt
    527.9 KB · Views: 2
  • rsl.out.0000.txt
    122.8 KB · Views: 2
Please confirm that your MPAS vertical level is 38 and WRF vertical level is 37.

I don't think MPAS could be 2x slower than WRF, although I don't have an immediate answer to the case you have seen.

I will talk to our software engineers and we will get back to you. Thank you for your patience.
 
Hi David,

Thanks for your question! So comparing the time taken between WRF and MPAS for your setup might be a little tricky. There are several aspects:

1. Firstly, MPAS uses an unstructured mesh and WRF uses a structured grid. The numerical methods in a structured grid solver like WRF are expected to be a somewhat faster than an unstructured solver like MPAS. This probably doesn't explain all of the slowdown you are seeing with MPAS.

2. I am not very familiar with WRF, but from what I gather you have a 3-domain 36/12/4-km grid with a different time step size for each domain. Whereas MPAS is just a quasi-uniform 4km mesh with a constant time step size everywhere. If you're comparing the time taken for each time advancement (1.50829 s for MPAS vs 0.54765s for WRF) then you probably need to account for the time taken across all three domains in WRF, not just domain 3.

Now the total time taken to run 6 hours is perhaps a more meaningful measure, but only if both models have a similar number of time advancements/steps and call similar radiation and physics schemes at similar intervals. And it's probably better for the comparison if you subtract the time taken for file I/O from the total time.

3. A 32% higher cell count in the MPAS grid may not increase the time taken proportionally (relative to WRF). A cleaner way to compare would be to use as similar a grid as possible between the two models, ideally with a (quasi) uniform mesh in both.

4. Could you also share the job script snippet or the command you use to run both the models? Wondering if WRF run uses OpenMP threading on top of MPI.

There are probably more aspects I'm forgetting, but let us know if any of these ideas help.
 
As shown in the namelist.input.txt file, e_vert = 38 for WRF (37 half levels and 38 full levels). For MPAS, log.atmosphere.0000.out.txt indicates
nVertLevelsP1 = 38
nVertLevels = 37
So I think I have the same number of vertical levels for both model runs. If the software engineers are able to do similar WRF and MPAS tests on NCAR machines and on any given limited-area domain with a corresponding WRF domain and they come up with much better performance for MPAS than what I see, I am happy to work with them to try to figure out what is wrong with my setup. I have decades of experience with WRF, but only days of experience with MPAS.
 
Hi David,

Thanks for your question! So comparing the time taken between WRF and MPAS for your setup might be a little tricky. There are several aspects:

1. Firstly, MPAS uses an unstructured mesh and WRF uses a structured grid. The numerical methods in a structured grid solver like WRF are expected to be a somewhat faster than an unstructured solver like MPAS. This probably doesn't explain all of the slowdown you are seeing with MPAS.

2. I am not very familiar with WRF, but from what I gather you have a 3-domain 36/12/4-km grid with a different time step size for each domain. Whereas MPAS is just a quasi-uniform 4km mesh with a constant time step size everywhere. If you're comparing the time taken for each time advancement (1.50829 s for MPAS vs 0.54765s for WRF) then you probably need to account for the time taken across all three domains in WRF, not just domain 3.

Now the total time taken to run 6 hours is perhaps a more meaningful measure, but only if both models have a similar number of time advancements/steps and call similar radiation and physics schemes at similar intervals. And it's probably better for the comparison if you subtract the time taken for file I/O from the total time.

3. A 32% higher cell count in the MPAS grid may not increase the time taken proportionally (relative to WRF). A cleaner way to compare would be to use as similar a grid as possible between the two models, ideally with a (quasi) uniform mesh in both.

4. Could you also share the job script snippet or the command you use to run both the models? Wondering if WRF run uses OpenMP threading on top of MPI.

There are probably more aspects I'm forgetting, but let us know if any of these ideas help.
Personally, I think 1. mght be the culprit or maybe that combined with differences in how MPAS handles the boundaries.
2 - I am giving you the wallclock times of the WRF 4-km time-step (24 second model timestep) and the MPAS 4-km time-step (also a 24 second model timestep) and am not including the WRF 36 and 12-km timesteps, but you can see them in the rsl.out.0000.txt file included above. If I ran a single WRF domain, it would make WRF run faster and make MPAS seem even slower.
3. I just completed another MPAS run on a 4-km mesh with 114,775 cells (much closer to the number for WRF) and that took about 1.19 seconds per integration time step -- still a factor of 2.1 slower than WRF's 0.54 seconds per timestep.
4. mpirun -np 32 $executable was used for WRF and MPAS.
I do not use OpenMP and OpenMPI -- I only use OpenMPI.
 
As confirmation of what I mentioned above, I am running a single 4-km domain WRF run with dimensions of 409x286 (116,974) cells, which is bigger than the MPAS 114,775 that I mention above. Time-step integrations are taking 0.56 seconds as compared to MPAS 1.19 seconds for 114,775 cells.
 
Thanks for checking! I think our team might also have some performance data points regarding WRF vs MPAS. We will share that as soon as we are able to.

Another thing that can help is looking at the timers for each region - dycore, physics, I/O etc. To possibly check which region consumes the most time. But I'm not sure if WRF provides this breakdown.

Also looking at the performance of a dycore-only test, such as the Jablonowski-Williamson baroclinic wave test, could help narrow down the source of the slowdown.
 
@davidovens I think some of the dynamics settings in your MPAS-A configuration may be one place to look. Could you try setting the following in your &nhyd_model namelist group?
config_time_integration_order = 3
config_dt = 24.0
config_split_dynamics_transport = false
config_number_of_sub_steps = 6
config_dynamics_split_steps = 1
config_coef_3rd_order = 0.25
config_relax_zone_divdamp_coef = 4.0
With a 4-km MPAS-A mesh, I think the above (in particular, the 24 s time step) should be stable, but if not, reducing the time step a bit might be the first adjustment to try.
 
Attached are a few slides showing a recent comparison of WRF v4.7.1 and MPAS-A v8.3.1 run on Derecho, where MPAS-A was between 16% slower and 5% faster than WRF. All details of the domain and model configurations should be captured in the slides, but if there are any additional details that would be helpful, I'll be glad to provide them.
 

Attachments

  • wrf_mpas_scaling.pdf
    1.7 MB · Views: 4
I tried those settings and speeded up MPAS-A by about 29%, but that still leaves it taking about 1.6x as long as WRF for me (0.855 seconds per timestep vs 0.534 for single-domain experiments with nearly the same number of cells 114,775 MPAS vs 114,898 for WRF). If you can share the namelist.input and namelist.atmosphere files from your scaling experiments in the above slides/PDF, I can run your experiments on my single 32-core machine to see what I get.
 
Top