Hello,
I am setting up a small 1-domain WRF run on Derecho. The domain size is 121x121 (x51 levels). I compiled WRF with dm+sm and with dm only. Since Derecho's nodes are 128 cores and given the domain size, I can only use one node.
I tried the following:
1) To run with the dm version only, but I get the error:
e_we = 121, nproc_x = 8, with cell width in x-direction = 15
e_sn = 121, nproc_y = 16, with cell width in y-direction = 7
--- ERROR: Reduce the MPI rank count, or redistribute the tasks.
-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE: <stdin> LINE: 2543
NOTE: 1 namelist settings are wrong. Please check and reset these options
Can dm only compilation be used at all with such a small domain? I tried setting nproc_x and nproc_y in the namelist but I will get the same error.
2) To run with the dm+sm version and submit the job with:
#PBS -l select=1:ncpus=128:mpiprocs=16: ompthreads=8
time mpiexec --cpu-bind depth -n 16 -ppn 16 -d 8 ./wrf.exe
(where 16x8) seems to be the best mpi/omp decomposition for this size, then a 60h long simulation takes about ~100minutes (which seems a bit long).
Is there anything that I am missing here, that I could do to speed things up, or this is the limit?
Thank you very much for your help.
I am setting up a small 1-domain WRF run on Derecho. The domain size is 121x121 (x51 levels). I compiled WRF with dm+sm and with dm only. Since Derecho's nodes are 128 cores and given the domain size, I can only use one node.
I tried the following:
1) To run with the dm version only, but I get the error:
e_we = 121, nproc_x = 8, with cell width in x-direction = 15
e_sn = 121, nproc_y = 16, with cell width in y-direction = 7
--- ERROR: Reduce the MPI rank count, or redistribute the tasks.
-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE: <stdin> LINE: 2543
NOTE: 1 namelist settings are wrong. Please check and reset these options
Can dm only compilation be used at all with such a small domain? I tried setting nproc_x and nproc_y in the namelist but I will get the same error.
2) To run with the dm+sm version and submit the job with:
#PBS -l select=1:ncpus=128:mpiprocs=16: ompthreads=8
time mpiexec --cpu-bind depth -n 16 -ppn 16 -d 8 ./wrf.exe
(where 16x8) seems to be the best mpi/omp decomposition for this size, then a 60h long simulation takes about ~100minutes (which seems a bit long).
Is there anything that I am missing here, that I could do to speed things up, or this is the limit?
Thank you very much for your help.