Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Not so much WRF speedup using multiple processors

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

gael_descombes

New member
Hello,

I am currently using WRFV4 on a cluster and working with intel compiler
(composer_xe_2013.1.117). I can compile and run WRF without any problem.
But it seems using only one processor even if I am asking for parallel
options in configure.

I an running on Intel(R) Xeon(R) E5520. infocpu command gives:
===== Processor composition =====
Processor name : Intel(R) Xeon(R) E5520
Packages(sockets) : 2
Cores : 8
Processors(CPUs) : 16
Cores per package : 4
Threads per core : 2

I've tested the two options 15, 20 on the configure file:
13. (serial) 14. (smpar) 15. (dmpar) 16. (dm+sm) INTEL (ifort/icc)
17. (dm+sm) INTEL (ifort/icc):
Xeon Phi (MIC architecture)
18. (serial) 19. (smpar) 20. (dmpar) 21. (dm+sm) INTEL (ifort/icc):
Xeon (SNB with AVX mods)
22. (serial) 23. (smpar) 24. (dmpar) 25. (dm+sm) INTEL (ifort/icc):
SGI MPT
26. (serial) 27. (smpar) 28. (dmpar) 29. (dm+sm) INTEL (ifort/icc):
IBM POE

==> If I am running wrf: mpirun -np=2 wrf.exe or mpirun -np=8 wrf.exe, I
have got almots no speed up. Only rsl.out.0000 show integration in time.
Do yo have any idea, why the parallelization is not efficient in my case?

Thanks,
Gael



the config file that I've got is:

SHELL = /bin/sh
DEVTOP = `pwd`
LIBINCLUDE = .
.SUFFIXES: .F .i .o .f90 .c
#### Get core settings from environment (set in compile script)
#### Note to add a core, this has to be added to.
COREDEFS = -DEM_CORE=$(WRF_EM_CORE) \
-DNMM_CORE=$(WRF_NMM_CORE) -DNMM_MAX_DIM=2600 \
-DDA_CORE=$(WRF_DA_CORE) \
-DWRFPLUS=$(WRF_PLUS_CORE)
#### Single location for defining total number of domains. You need
#### at least 1 + 2*(number of total nests). For example, 1 coarse
#### grid + three fine grids = 1 + 2(3) = 7, so MAX_DOMAINS=7.
MAX_DOMAINS = 21
#### DM buffer length for the configuration flags.
CONFIG_BUF_LEN = 65536
MAX_HISTORY = 25
IWORDSIZE = 4
DWORDSIZE = 8
LWORDSIZE = 4
NATIVE_RWORDSIZE = 4
SED_FTN = $(WRF_SRC_ROOT_DIR)/tools/standard.exe
IO_GRIB_SHARE_DIR =
ESMF_COUPLING = 0
# select dependences on module_utility.o
ESMF_MOD_DEPENDENCE =
$(WRF_SRC_ROOT_DIR)/external/esmf_time_f90/module_utility.o
# select -I options for external/io_esmf vs. external/esmf_time_f90
ESMF_IO_INC = -I$(WRF_SRC_ROOT_DIR)/external/esmf_time_f90
# select -I options for separately installed ESMF library, if present
ESMF_MOD_INC = $(ESMF_IO_INC)
# select cpp token for external/io_esmf vs. external/esmf_time_f90
ESMF_IO_DEFS =
# select build target for external/io_esmf vs. external/esmf_time_f90
ESMF_TARGET = esmf_time
NETCDF4_IO_OPTS = -DUSE_NETCDF4_FEATURES -DWRFIO_NCD_LARGE_FILE_SUPPORT
GPFS =
CURL =
HDF5 =
ZLIB =
DEP_LIB_PATH =
NETCDF4_DEP_LIB = $(DEP_LIB_PATH) $(HDF5) $(ZLIB) $(GPFS) $(CURL)
LIBWRFLIB = libwrflib.a
DESCRIPTION = INTEL ($SFC/$SCC)
DMPARALLEL = 1
OMPCPP = # -D_OPENMP
OMP = # -openmp -fpp -auto
OMPCC = # -openmp -fpp -auto
SFC = ifort
SCC = icc
CCOMP = icc
DM_FC = mpif90 -f90=$(SFC)
DM_CC = mpicc -cc=$(SCC)
FC = time $(DM_FC)
CC = $(DM_CC) -DFSEEKO64_OK
LD = $(FC)
RWORDSIZE = $(NATIVE_RWORDSIZE)
PROMOTION = -real-size `expr 8 \* $(RWORDSIZE)` -i4
ARCH_LOCAL = -DNONSTANDARD_SYSTEM_FUNC -DWRF_USE_CLM
CFLAGS_LOCAL = -w -O3 -ip #-xHost -fp-model fast=2 -no-prec-div
-no-prec-sqrt -ftz -no-multibyte-chars
LDFLAGS_LOCAL = -ip #-xHost -fp-model fast=2 -no-prec-div
-no-prec-sqrt -ftz -align all -fno-alias -fno-common
CPLUSPLUSLIB =
ESMF_LDFLAG = $(CPLUSPLUSLIB)
FCOPTIM = -O3
FCREDUCEDOPT = $(FCOPTIM)
FCNOOPT = -O0 -fno-inline -no-ip
FCDEBUG = # -g $(FCNOOPT) -traceback # -fpe0 -check
noarg_temp_created,bounds,format,output_conversion,pointers,uninit -ftrapuv
-unroll0 -u
FORMAT_FIXED = -FI
FORMAT_FREE = -FR
FCSUFFIX =
BYTESWAPIO = -convert big_endian
RECORDLENGTH = -assume byterecl
FCBASEOPTS_NO_G = -ip -fp-model precise -w -ftz -align all -fno-alias
$(FORMAT_FREE) $(BYTESWAPIO) #-xHost -fp-model fast=2 -no-heap-arrays
-no-prec-div -no-prec-sqrt -fno-common
FCBASEOPTS = $(FCBASEOPTS_NO_G) $(FCDEBUG)
MODULE_SRCH_FLAG =
TRADFLAG = -traditional-cpp
CPP = /lib/cpp -P -nostdinc
AR = ar
ARFLAGS = ru
M4 = m4
RANLIB = ranlib
RLFLAGS =
CC_TOOLS = $(SCC)

POSTAMBLE

FGREP = fgrep -iq

ARCHFLAGS = $(COREDEFS) -DIWORDSIZE=$(IWORDSIZE)
-DDWORDSIZE=$(DWORDSIZE) -DRWORDSIZE=$(RWORDSIZE) -DLWORDSIZE=$(LWORDSIZE)
\
$(ARCH_LOCAL) \
$(DA_ARCHFLAGS) \
-DDM_PARALLEL \
\
-DNETCDF \
-DPNETCDF \
\
\
\
\
-DHDF5 \
\
\
\
\
-DUSE_ALLOCATABLES \
-Dwrfmodel \
-DGRIB1 \
-DINTIO \
-DKEEP_INT_AROUND \
-DLIMIT_ARGS \
-DBUILD_RRTMG_FAST=1 \
-DCONFIG_BUF_LEN=$(CONFIG_BUF_LEN) \
-DMAX_DOMAINS_F=$(MAX_DOMAINS) \
-DMAX_HISTORY=$(MAX_HISTORY) \
-DNMM_NEST=$(WRF_NMM_NEST)
CFLAGS = $(CFLAGS_LOCAL) -DDM_PARALLEL \
-DMAX_HISTORY=$(MAX_HISTORY)
-DNMM_CORE=$(WRF_NMM_CORE)
FCFLAGS = $(FCOPTIM) $(FCBASEOPTS)
ESMF_LIB_FLAGS =
# ESMF 5 -- these are defined in esmf.mk, included above
ESMF_IO_LIB = -L$(WRF_SRC_ROOT_DIR)/external/esmf_time_f90
-lesmf_time
ESMF_IO_LIB_EXT = -L$(WRF_SRC_ROOT_DIR)/external/esmf_time_f90
-lesmf_time
INCLUDE_MODULES = $(MODULE_SRCH_FLAG) \
$(ESMF_MOD_INC) $(ESMF_LIB_FLAGS) \
-I$(WRF_SRC_ROOT_DIR)/main \
-I$(WRF_SRC_ROOT_DIR)/external/io_netcdf \
-I$(WRF_SRC_ROOT_DIR)/external/io_int \
-I$(WRF_SRC_ROOT_DIR)/frame \
-I$(WRF_SRC_ROOT_DIR)/share \
-I$(WRF_SRC_ROOT_DIR)/phys \
-I$(WRF_SRC_ROOT_DIR)/wrftladj \
-I$(WRF_SRC_ROOT_DIR)/chem -I$(WRF_SRC_ROOT_DIR)/inc
\
-I$(NETCDFPATH)/include \

REGISTRY = Registry
CC_TOOLS_CFLAGS = -DNMM_CORE=$(WRF_NMM_CORE)

LIB_BUNDLED = \

$(WRF_SRC_ROOT_DIR)/external/fftpack/fftpack5/libfftpack.a \
$(WRF_SRC_ROOT_DIR)/external/io_grib1/libio_grib1.a \

$(WRF_SRC_ROOT_DIR)/external/io_grib_share/libio_grib_share.a \
$(WRF_SRC_ROOT_DIR)/external/io_int/libwrfio_int.a \
$(ESMF_IO_LIB) \
$(WRF_SRC_ROOT_DIR)/external/RSL_LITE/librsl_lite.a \

$(WRF_SRC_ROOT_DIR)/frame/module_internal_header_util.o \
$(WRF_SRC_ROOT_DIR)/frame/pack_utils.o

LIB_EXTERNAL = \
-L$(WRF_SRC_ROOT_DIR)/external/io_netcdf -lwrfio_nf
-L/share/software/netcdf4/lib -lnetcdff -lnetcdf
-L$(WRF_SRC_ROOT_DIR)/external/io_pnetcdf -lwrfio_pnf
-L/share/software/pnetcdf/lib -lpnetcdf -L/share/software/hdf5/lib
-lhdf5_fortran -lhdf5 -lm -lz
LIB = $(LIB_BUNDLED) $(LIB_EXTERNAL) $(LIB_LOCAL)
$(LIB_WRF_HYDRO)
LDFLAGS = $(OMP) $(FCFLAGS) $(LDFLAGS_LOCAL)
ENVCOMPDEFS =
WRF_CHEM = 0
CPPFLAGS = $(ARCHFLAGS) $(ENVCOMPDEFS) -I$(LIBINCLUDE)
$(TRADFLAG)
NETCDFPATH = /share/software/netcdf4
HDF5PATH = /share/software/hdf5
WRFPLUSPATH =
RTTOVPATH =
PNETCDFPATH = /share/software/pnetcdf

bundled: io_only
external: io_only $(WRF_SRC_ROOT_DIR)/external/RSL_LITE/librsl_lite.a
gen_comms_rsllite module_dm_rsllite $(ESMF_TARGET)
io_only: esmf_time wrfio_nf wrfio_pnf \
wrf_ioapi_includes wrfio_grib_share wrfio_grib1 wrfio_int fftpack
 
Hi,
Can you package your rsl.out.* files together for a 2-processor run, and then for an 8-processor run (in *.TAR files) so that I can take a look? Please also send your namelist.input file. Thanks!
 
Top