Hello everyone,
I am running a wrf model with this publicly available data from https://nomads.ncep.noaa.gov/pub/data/nccf/com/gfs/prod/gfs.20230313/00/atmos/. wrf.exe completed successfully in 34 hrs using 20 mpi processes. I was trying to reduce the runtime by increasing the number of mpi processes, but I couldn't make it run faster. While trying to debug this, I figured out that beyond 8 mpi processes, the wrf.exe is not scaling any further (i.e. with 8 mpi processes, the runtime is 34hrs and further increasing the number of processes doesn't reduce the runtime. Till 8 mpi processes, it is scaling linearly). Also, I couldn't find any system-related issues (insufficient memory etc.). I am very new to WRF and am trying to understand if any configuration parameters like in namelist.input could be causing this scalability issue. I would very much appreciate any help in solving this problem. The expected runtime was around 3 to 4 hours.
All the details are listed below and also attaching namelist.input file and rsl.out.0000 file
WRF v4.2.1 (configured in dmpar option) and WPSv4.2
Input data : 48 files each of size ~500MB (gfs.t00z.pgrb2.0p25.f000 to gfs.t00z.pgrb2.0p25.f048) from the above-mentioned URL
Server : Dual-socket server with 2 x Intel Xeon Gold 6230 cpus(20 core, 40 thread, 2.1GHz) and 120GB RAM (during the run, memory utilization is always less than 40GB)
Thank you
I am running a wrf model with this publicly available data from https://nomads.ncep.noaa.gov/pub/data/nccf/com/gfs/prod/gfs.20230313/00/atmos/. wrf.exe completed successfully in 34 hrs using 20 mpi processes. I was trying to reduce the runtime by increasing the number of mpi processes, but I couldn't make it run faster. While trying to debug this, I figured out that beyond 8 mpi processes, the wrf.exe is not scaling any further (i.e. with 8 mpi processes, the runtime is 34hrs and further increasing the number of processes doesn't reduce the runtime. Till 8 mpi processes, it is scaling linearly). Also, I couldn't find any system-related issues (insufficient memory etc.). I am very new to WRF and am trying to understand if any configuration parameters like in namelist.input could be causing this scalability issue. I would very much appreciate any help in solving this problem. The expected runtime was around 3 to 4 hours.
All the details are listed below and also attaching namelist.input file and rsl.out.0000 file
WRF v4.2.1 (configured in dmpar option) and WPSv4.2
Input data : 48 files each of size ~500MB (gfs.t00z.pgrb2.0p25.f000 to gfs.t00z.pgrb2.0p25.f048) from the above-mentioned URL
Server : Dual-socket server with 2 x Intel Xeon Gold 6230 cpus(20 core, 40 thread, 2.1GHz) and 120GB RAM (during the run, memory utilization is always less than 40GB)
Thank you