I built WRF according to WRF Compilation Tutorial using dmpar during the configuration, and was able to run the tropical cyclone case in under an hours (45 to 55 minutes) using 20 cores in a cluster (Skylake). The problem I am having, is that when I configure WRF using dm+sm, the run takes more than 250 times the time to complete the run (a little less than 11 days), using the same default settings for the idealized cased mentioned before, in the same cluster and with the same number of cores selected when running WRF (mpirun -np 20 ./wrf.exe).
Should I be setting a limit to the OpenMP threads? Should I be modifying the default tiles setting?
Any help would be greatly appreciated.
- System env tests: passed.
- Libraries built: netcdf and mpich.
- Library compatibility test: passed.
- Build WRF 4.12: successfull using em_tropical_cyclone.
- WRF run: successful - dm 45-55 minutes, dm+sm 250+ hours.
- Compiler: intel.
- Other existing libraries: openmpi 4.0.0
Should I be setting a limit to the OpenMP threads? Should I be modifying the default tiles setting?
Any help would be greatly appreciated.