Cheyenne WRF performance issue

avipwrfhelp · Jan 25, 2022

Hello,
I am noticing a dramatic performance drop on Cheyenne on my WRF runs which I cannot reasonably explain.

In June 2020, I ran a benchmark -- in /glade/u/home/avijit/work/test/conus-128/ where the performance was <1s/timestep which was very good. I ran the same benchmark recently in January 2021 -- in /glade/scratch/avijit/conus-test/conus-128, and the performance is 134s/ts, which is a degradation of 200x, and there probably has an explanation for this.

Both the runs use the same software stack -- binary and modules (see conus-test.pbs), namelist and afore-mentioned submit script. The binary is /glade/u/home/avijit/work/wrf/WRF-4.1.3/bin/wrf.exe.n-hb. For both runs, the forcing files were generated on another system and ported over -- due to space issue. Others in our research group have also noticed this kind of performance issue with the same software stack, but for different runs.

We'd appreciate it if you can point to a reasonable explanation as to why a performance drop of over 200x occured in 6 months on the same software stack, or what the problem is so we can work around it.

Thanks
-- Avi

kwerner · Jan 27, 2022

Hi Avi,
If you are using the exact same WRF code for these two tests, then it doesn't seem to be an issue with WRF, and unfortunately our team is unlikely to be able to help. I suggest contacting the CISL support group, who manages the Cheyenne system.

kshack · Jun 6, 2022

Hi Avi,
Were you able to resolve this issue? I've been attempting to run coupled WRF simulations on Cheyenne with WRF coupled to the MITgcm ocean model using the ESMF coupler and am currently having issues with the WRF component of the coupled model not scaling well on Cheyenne. Specifically, the model doesn't seem to be scaling across multiple nodes, although performance is roughly 4-5s /ts, so not quite as poorly as your example.

Cheyenne WRF performance issue

avipwrfhelp

New member

kwerner

Administrator

kshack

New member