To choose an appropriate number of processors, you will need to consider the decomposition of the processes in relation to the size of the domains. For processing, your domain will be divided up into tiles, and the number of tiles depends on the total number of processors you use - 1 tile per processor. Each tile will have a minimum of 5 rows/columns on each side (called ‘halo’ regions), which are used to pass information from each cell/processor to the neighboring tile. You do not want your entire tile to be halo regions, as you will want some actual space for computation in the middle of each tile. If the computation space does not exist, it can cause the model to crash, or the output to be unrealistic. To test for this, the model takes the total number of grid spaces in the west-east direction and divides by the number of tiles in the x-direction [(e_we)/(x-tiles)]. You want the resulting number to be at the very least, greater than 10. Then do the same for the south-north direction [(e_sn)/(y-tiles)], again making sure it’s greater than 10.
Decomposition will be determined based on the 2 closest factors for the total number of processors. So if you chose 16 processors, the decomposition would be 4x4, which is nice and even and creates a square grid. Choosing something like 11 processors would likely cause problems as the decomposition would be 1x11 since that is a prime number. We want to stay as close to squares as is possible, but that can be deviated from somewhat.
The largest number of processors you should use should be based on your smallest domain, and the smallest number of processors you should use should be based on your largest domain. This is why it is important to not have domains that vary too much in size (grid spaces).
You also don’t want to use too few processors, as that can make your run very slow (or impossible - the model can crash), so you’ll need to consider that, as well.
A good STARTING PLACE is to use the following equations:
For your smallest-sized domain:
((e_we)/25) * ((e_sn)/25) = most amount of processors you should use
For your largest-sized domain:
((e_we)/100) * ((e_sn)/100) = least amount of processors you should use
and then play around with it from there to see if you can find a good balance for the domain set-up you’re using, checking the decomposition and number of tiles. Keep in mind this is just a rule-of-thumb, so you may be able to pick something at the far end of one of those 2 values, or somewhere right in the middle. You may also be able to outside those boundaries, depending on the decomposition. Each run is a little bit different. Often, even though you may be using enough processors based on the above equation (dividing by 100), that may not be enough due to several other components (e.g., resolution of each domain, specific physics options, etc.), so if your simulation is failing, try using more processors.
**Note**
If the least number of processors you can use to satisfy the compute requirements of the largest domain is GREATER THAN the most you can use for the smallest domain, then your configuration will not work. It may be necessary to use the ndown program to run your domains separately at the point when that becomes a problem (e.g., if d01, d02, d03 all work okay for a certain number of processors, but d04 is too large, then you can run the first 3 domains as a single run, and then use ndown to run the 4th domain separately).
You may also try the attached Python script that helps you determine the max number of processors you can use, based on your domain size and the number of processors per node on your machine. Keep in mind you'll need to modify the script for your specific case.
Decomposition will be determined based on the 2 closest factors for the total number of processors. So if you chose 16 processors, the decomposition would be 4x4, which is nice and even and creates a square grid. Choosing something like 11 processors would likely cause problems as the decomposition would be 1x11 since that is a prime number. We want to stay as close to squares as is possible, but that can be deviated from somewhat.
The largest number of processors you should use should be based on your smallest domain, and the smallest number of processors you should use should be based on your largest domain. This is why it is important to not have domains that vary too much in size (grid spaces).
You also don’t want to use too few processors, as that can make your run very slow (or impossible - the model can crash), so you’ll need to consider that, as well.
A good STARTING PLACE is to use the following equations:
For your smallest-sized domain:
((e_we)/25) * ((e_sn)/25) = most amount of processors you should use
For your largest-sized domain:
((e_we)/100) * ((e_sn)/100) = least amount of processors you should use
and then play around with it from there to see if you can find a good balance for the domain set-up you’re using, checking the decomposition and number of tiles. Keep in mind this is just a rule-of-thumb, so you may be able to pick something at the far end of one of those 2 values, or somewhere right in the middle. You may also be able to outside those boundaries, depending on the decomposition. Each run is a little bit different. Often, even though you may be using enough processors based on the above equation (dividing by 100), that may not be enough due to several other components (e.g., resolution of each domain, specific physics options, etc.), so if your simulation is failing, try using more processors.
**Note**
If the least number of processors you can use to satisfy the compute requirements of the largest domain is GREATER THAN the most you can use for the smallest domain, then your configuration will not work. It may be necessary to use the ndown program to run your domains separately at the point when that becomes a problem (e.g., if d01, d02, d03 all work okay for a certain number of processors, but d04 is too large, then you can run the first 3 domains as a single run, and then use ndown to run the 4th domain separately).
You may also try the attached Python script that helps you determine the max number of processors you can use, based on your domain size and the number of processors per node on your machine. Keep in mind you'll need to modify the script for your specific case.
Attachments
Last edited: