Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

WRF V4.0 MPI run does not work with too many processors

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

LarryTian

New member
Hello,

I encountered an error during wrf.exe with same namelist and same mpirun command after WRF update to version 4.0.
In previous WRF versions, I didn't get such an error.
In version 4.0, when I set the processors number like 48, it works.
However, I want to run wrf.exe with more processors like 72, it didn't allow that. I got the following error report:

*************************************
For domain 1 , the domain size is too small for this many processors, or the decomposition aspect ratio is poor.
Minimum decomposed computational patch size, either x-dir or y-dir, is 10 grid cells.
e_we = 80, nproc_x = 8, with cell width in x-direction = 10
e_sn = 80, nproc_y = 9, with cell width in y-direction = 8
--- ERROR: Reduce the MPI rank count, or redistribute the tasks.
-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE: <stdin> LINE: 1730
NOTE: 1 namelist settings are wrong. Please check and reset these options
*************************************

Hope someone can help me and any tips are welcome.
Tons of thanks in advance.
 
Hi,
Beginning in V4.0, a check was implemented to make sure that you are not using too many processors. Your domain will be divided up into smaller squares, based on the number of processors you are using, and the size of your domains. When this happens, you cannot have fewer than 10 processing squares in either the x or the y direction. Take a look at this FAQ for more information about how this works, and for information on choosing a "good" number of processors.
 
kwerner said:
Hi,
Beginning in V4.0, a check was implemented to make sure that you are not using too many processors. Your domain will be divided up into smaller squares, based on the number of processors you are using, and the size of your domains. When this happens, you cannot have fewer than 10 processing squares in either the x or the y direction. Take a look at this FAQ for more information about how this works, and for information on choosing a "good" number of processors.

Hi,
Based on your reply and FAQs post, we cannot have two domain; one very big and one with small with a large number of cores.
This was not a case in previous versions though, right? Can we turn this contol off in Version 4 as well? Or play with the threshold number and size of halo?

Thanks,
 
Hi raul1989,
The check in the code (that stops the code) was not in versions prior to V4, but it should have been. It is bad practice to use too many processors for a small domain, as is explained in the FAQ question mentioned above. The simulation may run, but the results are likely unreasonable.
 
Not if you are running the domains together. It is possible to use the ndown process to run them separately so that you can request a different number of processors for each domain. But it's typically just not necessary to size your domains so differently. If you are interested in a nested domain that needs to be a large size, then you may as well expand your outer domain to be much larger, as the computational cost for the outer domain is fairly negligible, compared to that of the inner (higher-resolution) domain. If it's the other way around, and you have a very small nested domain, then there probably shouldn't be a reason for such a large outer domain.
 
Hello,

I recently switched from WRF version 3.9 to 4.2.1. I was having the same problem described here and your FAQ was useful.

I used the subroutine mpaspect in frame/module_dm.F to calculate in advance what number of cores are going to be accepted to run and avoid getting the error. However, I found out that the code always assumes that e_sn > e_we. For example, I have a domain with e_we = 270 and e_sn = 205. In this case, the maximum number of processors that I can run to meet the criteria is 400. Now, if I switch dimensions and my domain is e_we = 205 and e_sn = 270, then the maximum number of processors increases up to 540. In this second case the criteria is also met, even though it would allow me to run using considerably more processors. I tried this for other domains that are less 'squared', such as a common simulation over the CONUS (e_we = 435 and e_sn = 255) and the differences were even higher: 625 (original domain) vs. 1075 (switching e_sn and e_we) cores. No matter what domain dimensions you have the result is also the same: if e_sn > e_we then it lets me run with more processors than if e_we > e_sn.

You may wonder why would I need to run using so many processors. The answer is that I run with chemistry and that makes my simulations a lot more expensive than just running meteorology. Because of this, I usually run using between 350 and 560 cores. According to the FAQ, the minimum and maximum number of processors that is recommended for my first domain are 6 and 88, respectively. Running with 88 is too low to run with chemistry and my simulations would take very long. The biggest problem arise when a domain is configured in very high resolution (e.g. 1 km), because in this case we have to run with a very small time step, which makes the simulations even longer. This is why I am interested in exploring how the code can be optimized.

Perhaps, there's a reason to assume that always e_sn > e_we. However, I think in most domains that users set this is not hold true. Is it possible that (if this is not intentional) could be fixed in a future release?

Thanks!
 
The WRF model has a simple algorithm for the default domain decomposition. However, it can be adjusted.

Assume that you would prefer to have 27 decompositions in the i-direction, and 20 decompositions in the j-direction. In namelist.input:
Code:
&domains
 nproc_x = 27
 nproc_y = 20
/

This will allow you to tinker with the aspect ratio to find the best performance for your setup.
 
Top