different results from different number of processors (resolved)

yuanlian · Feb 8, 2021

Hi, I noticed that different number of processes generated different results. For instance, running atmospheric model (mesh is x1.2562) with a single Broadwell node on Pleiades (28 cores) generates results different from multiple Broadwell nodes (4 or 7 nodes in my tests). I don't think the difference is caused by the physics I added to the model since these physics parameterizations are all column models. I suspect this has something to do with advection or halo exchange (e.g., lateral exchange of information between mesh partitions) in the dynamical core.

Did anyone experience similar issues?

Thanks,
Yuan

mgduda · Feb 8, 2021

We have seen similar issues that appear to be the result of compiler optimizations of various sorts. Have you tried turning off optimizations in the make target for your compiler in the top-level Makefile to see whether that enables you to get bitwise identical results for different MPI task counts? For example, if you're using the Intel compilers, you could try editing the 'ifort' target to look like the following:

Code:

ifort:
        ( $(MAKE) all \
        "FC_PARALLEL = mpif90" \
        "CC_PARALLEL = mpicc" \
        "CXX_PARALLEL = mpicxx" \
        "FC_SERIAL = ifort" \
        "CC_SERIAL = icc" \
        "CXX_SERIAL = icpc" \
        "FFLAGS_PROMOTION = -real-size 64" \
        "FFLAGS_OPT = -O0 -fp-model precise -convert big_endian -free -align array64byte" \
        "CFLAGS_OPT = -O0 -fp-model precise" \
        "CXXFLAGS_OPT = -O0" \
        "LDFLAGS_OPT = -O0" \
        "FFLAGS_DEBUG = -g -convert big_endian -free -CU -CB -check all -fpe0 -traceback" \
        "CFLAGS_DEBUG = -g -traceback" \
        "CXXFLAGS_DEBUG = -g -traceback" \
        "LDFLAGS_DEBUG = -g -fpe0 -traceback" \
        "FFLAGS_OMP = -qopenmp" \
        "CFLAGS_OMP = -qopenmp" \
        "CORE = $(CORE)" \
        "DEBUG = $(DEBUG)" \
        "USE_PAPI = $(USE_PAPI)" \
        "OPENMP = $(OPENMP)" \
        "CPPFLAGS = $(MODEL_FORMULATION) -D_MPI" )

For the Intel Fortran compiler (ifort) specifically, it can help to add the '-fp-model precise' flag (as I've done in the above example).

Another test that might be worth trying is to turn off all physics schemes and see whether you get bitwise identical results with just the dynamical core; that might help in tracking the issue down to either physics or dynamics. The easiest way to turn off physics in the v7.0 release of the model would be to set

Code:

config_physics_suite = 'none'

in the &physics group in the namelist.atmosphere file.

yuanlian · Feb 9, 2021

Hi Michael, thanks for the quick reply. I found what I did wrong. The dynamical core itself (including various combination of split time steps) doesn't have any issues with number of processors. The issue came from my mistake when I did some horizontal averaging of wind vectors between two cells, where I accidentally used cell center instead of edge for one of the wind vectors. The issue is now resolved. Thanks again.

mgduda · Feb 9, 2021

Thanks for following up, and I'm glad to hear you've tracked down the issue!

different results from different number of processors (resolved)

yuanlian

New member

mgduda

Administrator

yuanlian

New member

mgduda

Administrator