Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

WRF simulation results are very slow at multicores processors

hhhyd

New member
Hi,

I am using a Linux cluster with GNU compilers to run WRF simulations using mpirun. Currently, I am using 9 nodes with a total of 36 cores, and the CPU usage appears efficient.

I have four domains, each with a grid size of 250 × 150. However, I am only getting 11 seconds of simulation results after running for an entire day. Do you have any suggestions to improve the calculation efficiency?

I've also attached my namelist here, as I suspect some settings might be contributing to the increased simulation time.

Thank you!
 

Attachments

  • namelist.input
    5 KB · Views: 5
Something is obviously wrong if it takes the whole day to simulate 11 seconds. In clusters, the data is shared between nodes through network, and if it does not function properly CPUs will wait for the data to be received and do nothing. Although in top it will look like they are fully utilized. So first thing to do is to fully test network; physical properties and quality of connections, but also routing tables, hostnames, verify that throughput and latency is expected, that nothing else is saturating network, switches, routers, and so on.
 
Thanks for your reply. I am going to check the network, but it appears that the network is functioning well for MPI. And I can use the MPI to running a very simple testing code. Additionally, I am wondering if using damper and smpar (35. (dm+sm) GNU (gfortran/gcc)) for the WRF compilation could affect the MPI performance of WRF? And the WRF's library setting of the mpi is
HYDRA build details:
Version: 4.2.2
Release Date: Wed Jul 3 09:16:22 AM CDT 2024
CC: gcc
Configure options: '--disable-option-checking' '--prefix=/home/houyidi/usr/local/mpich-4.2.2' '--with-hwloc=embedded' '--enable-fast=O3' '--enable-cxx' '--with-device=ch4:eek:fi' '--cache-file=/dev/null' '--srcdir=../../../../src/pm/hydra' 'CC=gcc' 'CFLAGS= -O3' 'LDFLAGS=' 'LIBS=' 'CPPFLAGS= -DNETMOD_INLINE=__netmod_inline_ofi__ -I/home/houyidi/TEMP/mpich-4.2.2/build/src/mpl/include -I/home/houyidi/TEMP/mpich-4.2.2/src/mpl/include -I/home/houyidi/TEMP/mpich-4.2.2/modules/json-c -I/home/houyidi/TEMP/mpich-4.2.2/build/modules/json-c -I/home/houyidi/TEMP/mpich-4.2.2/modules/hwloc/include -I/home/houyidi/TEMP/mpich-4.2.2/build/modules/hwloc/include -D_REENTRANT -I/home/houyidi/TEMP/mpich-4.2.2/build/src/mpi/romio/include -I/home/houyidi/TEMP/mpich-4.2.2/src/pmi/include -I/home/houyidi/TEMP/mpich-4.2.2/build/src/pmi/include -I/home/houyidi/TEMP/mpich-4.2.2/build/modules/yaksa/src/frontend/include -I/home/houyidi/TEMP/mpich-4.2.2/modules/yaksa/src/frontend/include -I/home/houyidi/TEMP/mpich-4.2.2/build/modules/libfabric/include -I/home/houyidi/TEMP/mpich-4.2.2/modules/libfabric/include'
Process Manager: pmi
Launchers available: ssh rsh fork slurm ll lsf sge manual persist
Topology libraries available: hwloc
Resource management kernels available: user slurm ll lsf sge pbs cobalt
Demux engines available: poll select
I really appreciate for your help and suggestions
 
I am wondering if using damper and smpar (35. (dm+sm) GNU (gfortran/gcc)) for the WRF compilation could affect the MPI performance of WRF?
Hi, Does this mean that you installed WRF with the dm+sm option? If so, please try dmpar instead. We typically don't suggest using dm+sm because results have been unfavorable.

I am using 9 nodes with a total of 36 cores
Per node, are there only 4 cores available? If there are more, try to use all the cores available for each node. You can also increase the number of processors you're using. 36 is not very many for the size of your domains. You should be able to use 100+.
 
Hi, Does this mean that you installed WRF with the dm+sm option? If so, please try dmpar instead. We typically don't suggest using dm+sm because results have been unfavorable.


Per node, are there only 4 cores available? If there are more, try to use all the cores available for each node. You can also increase the number of processors you're using. 36 is not very many for the size of your domains. You should be able to use 100+.
Thanks for your reply, I really appreciate it. Yes, I installed WRF by using dm+sm option, I will reinstall it and have a try again. We have access to 9 nodes, each with 36 cores, but only using 4 cores per node achieves maximum CPU efficiency. I will increase the number of cores used for the computations.
 
Hi, Does this mean that you installed WRF with the dm+sm option? If so, please try dmpar instead. We typically don't suggest using dm+sm because results have been unfavorable.


Per node, are there only 4 cores available? If there are more, try to use all the cores available for each node. You can also increase the number of processors you're using. 36 is not very many for the size of your domains. You should be able to use 100+.

I would like to add that I installed WRF with both options: (dmpar) and (dm+sm). Below are excerpts from rsl.error.0000 for each. Regarding your comment about unfavorable results for dm+sm, I believe this is what you were referring to.

dmpar:
Timing for main: time 2019-01-31_12:00:05 on domain 3: 65.18736 elapsed seconds
Timing for main: time 2019-01-31_12:00:10 on domain 3: 3.05215 elapsed seconds
Timing for main: time 2019-01-31_12:00:15 on domain 3: 3.08315 elapsed seconds
Timing for main: time 2019-01-31_12:00:15 on domain 2: 188.21999 elapsed seconds
Timing for main: time 2019-01-31_12:00:20 on domain 3: 2.95739 elapsed seconds
Timing for main: time 2019-01-31_12:00:25 on domain 3: 2.93322 elapsed seconds
Timing for main: time 2019-01-31_12:00:30 on domain 3: 3.09663 elapsed seconds
Timing for main: time 2019-01-31_12:00:30 on domain 2: 12.84171 elapsed seconds
Timing for main: time 2019-01-31_12:00:35 on domain 3: 2.85621 elapsed seconds
Timing for main: time 2019-01-31_12:00:40 on domain 3: 3.07203 elapsed seconds

dm+sm:
Timing for main: time 2019-01-31_12:00:05 on domain 3: 1835.38403 elapsed seconds
Timing for main: time 2019-01-31_12:00:10 on domain 3: 1712.58850 elapsed seconds
Timing for main: time 2019-01-31_12:00:15 on domain 3: 1824.12122 elapsed seconds
Timing for main: time 2019-01-31_12:00:15 on domain 2: ********** elapsed seconds
Timing for main: time 2019-01-31_12:00:20 on domain 3: 1817.42700 elapsed seconds
Timing for main: time 2019-01-31_12:00:25 on domain 3: 1793.01123 elapsed seconds
Timing for main: time 2019-01-31_12:00:30 on domain 3: 1750.04919 elapsed seconds
Timing for main: time 2019-01-31_12:00:30 on domain 2: 9065.50586 elapsed seconds
Timing for main: time 2019-01-31_12:00:35 on domain 3: 1732.65112 elapsed seconds
Timing for main: time 2019-01-31_12:00:40 on domain 3: 1722.15918 elapsed seconds
 
Top