Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Cheyenne WRF performance issue

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.


New member
I am noticing a dramatic performance drop on Cheyenne on my WRF runs which I cannot reasonably explain.

In June 2020, I ran a benchmark -- in /glade/u/home/avijit/work/test/conus-128/ where the performance was <1s/timestep which was very good. I ran the same benchmark recently in January 2021 -- in /glade/scratch/avijit/conus-test/conus-128, and the performance is 134s/ts, which is a degradation of 200x, and there probably has an explanation for this.

Both the runs use the same software stack -- binary and modules (see conus-test.pbs), namelist and afore-mentioned submit script. The binary is /glade/u/home/avijit/work/wrf/WRF-4.1.3/bin/wrf.exe.n-hb. For both runs, the forcing files were generated on another system and ported over -- due to space issue. Others in our research group have also noticed this kind of performance issue with the same software stack, but for different runs.

We'd appreciate it if you can point to a reasonable explanation as to why a performance drop of over 200x occured in 6 months on the same software stack, or what the problem is so we can work around it.

-- Avi
Hi Avi,
If you are using the exact same WRF code for these two tests, then it doesn't seem to be an issue with WRF, and unfortunately our team is unlikely to be able to help. I suggest contacting the CISL support group, who manages the Cheyenne system.
Hi Avi,
Were you able to resolve this issue? I've been attempting to run coupled WRF simulations on Cheyenne with WRF coupled to the MITgcm ocean model using the ESMF coupler and am currently having issues with the WRF component of the coupled model not scaling well on Cheyenne. Specifically, the model doesn't seem to be scaling across multiple nodes, although performance is roughly 4-5s /ts, so not quite as poorly as your example.