avipwrfhelp
New member
Hello,
I am noticing a dramatic performance drop on Cheyenne on my WRF runs which I cannot reasonably explain.
In June 2020, I ran a benchmark -- in /glade/u/home/avijit/work/test/conus-128/ where the performance was <1s/timestep which was very good. I ran the same benchmark recently in January 2021 -- in /glade/scratch/avijit/conus-test/conus-128, and the performance is 134s/ts, which is a degradation of 200x, and there probably has an explanation for this.
Both the runs use the same software stack -- binary and modules (see conus-test.pbs), namelist and afore-mentioned submit script. The binary is /glade/u/home/avijit/work/wrf/WRF-4.1.3/bin/wrf.exe.n-hb. For both runs, the forcing files were generated on another system and ported over -- due to space issue. Others in our research group have also noticed this kind of performance issue with the same software stack, but for different runs.
We'd appreciate it if you can point to a reasonable explanation as to why a performance drop of over 200x occured in 6 months on the same software stack, or what the problem is so we can work around it.
Thanks
-- Avi
I am noticing a dramatic performance drop on Cheyenne on my WRF runs which I cannot reasonably explain.
In June 2020, I ran a benchmark -- in /glade/u/home/avijit/work/test/conus-128/ where the performance was <1s/timestep which was very good. I ran the same benchmark recently in January 2021 -- in /glade/scratch/avijit/conus-test/conus-128, and the performance is 134s/ts, which is a degradation of 200x, and there probably has an explanation for this.
Both the runs use the same software stack -- binary and modules (see conus-test.pbs), namelist and afore-mentioned submit script. The binary is /glade/u/home/avijit/work/wrf/WRF-4.1.3/bin/wrf.exe.n-hb. For both runs, the forcing files were generated on another system and ported over -- due to space issue. Others in our research group have also noticed this kind of performance issue with the same software stack, but for different runs.
We'd appreciate it if you can point to a reasonable explanation as to why a performance drop of over 200x occured in 6 months on the same software stack, or what the problem is so we can work around it.
Thanks
-- Avi