Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

different results from different number of processors (resolved)

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.


New member
Hi, I noticed that different number of processes generated different results. For instance, running atmospheric model (mesh is x1.2562) with a single Broadwell node on Pleiades (28 cores) generates results different from multiple Broadwell nodes (4 or 7 nodes in my tests). I don't think the difference is caused by the physics I added to the model since these physics parameterizations are all column models. I suspect this has something to do with advection or halo exchange (e.g., lateral exchange of information between mesh partitions) in the dynamical core.

Did anyone experience similar issues?

We have seen similar issues that appear to be the result of compiler optimizations of various sorts. Have you tried turning off optimizations in the make target for your compiler in the top-level Makefile to see whether that enables you to get bitwise identical results for different MPI task counts? For example, if you're using the Intel compilers, you could try editing the 'ifort' target to look like the following:
        ( $(MAKE) all \
        "FC_PARALLEL = mpif90" \
        "CC_PARALLEL = mpicc" \
        "CXX_PARALLEL = mpicxx" \
        "FC_SERIAL = ifort" \
        "CC_SERIAL = icc" \
        "CXX_SERIAL = icpc" \
        "FFLAGS_PROMOTION = -real-size 64" \
        "FFLAGS_OPT = -O0 -fp-model precise -convert big_endian -free -align array64byte" \
        "CFLAGS_OPT = -O0 -fp-model precise" \
        "CXXFLAGS_OPT = -O0" \
        "LDFLAGS_OPT = -O0" \
        "FFLAGS_DEBUG = -g -convert big_endian -free -CU -CB -check all -fpe0 -traceback" \
        "CFLAGS_DEBUG = -g -traceback" \
        "CXXFLAGS_DEBUG = -g -traceback" \
        "LDFLAGS_DEBUG = -g -fpe0 -traceback" \
        "FFLAGS_OMP = -qopenmp" \
        "CFLAGS_OMP = -qopenmp" \
        "CORE = $(CORE)" \
        "DEBUG = $(DEBUG)" \
        "USE_PAPI = $(USE_PAPI)" \
        "OPENMP = $(OPENMP)" \
For the Intel Fortran compiler (ifort) specifically, it can help to add the '-fp-model precise' flag (as I've done in the above example).

Another test that might be worth trying is to turn off all physics schemes and see whether you get bitwise identical results with just the dynamical core; that might help in tracking the issue down to either physics or dynamics. The easiest way to turn off physics in the v7.0 release of the model would be to set
config_physics_suite = 'none'
in the &physics group in the namelist.atmosphere file.
Hi Michael, thanks for the quick reply. I found what I did wrong. The dynamical core itself (including various combination of split time steps) doesn't have any issues with number of processors. The issue came from my mistake when I did some horizontal averaging of wind vectors between two cells, where I accidentally used cell center instead of edge for one of the wind vectors. The issue is now resolved. Thanks again.