Dear WRF Support community,
I am using WRFV.4.4.2, compiled using dmpar option, and I am having difficulties running wrf.exe in parallel in the sense that it seems WRF is somehow multiplying by 4 the number of threads I want to use and so, sometimes the run hangs/frozen at this point:
The runs finally stops when the walltime of the job reaches its value. No errors appears. However, after contacting HPC support I got the following answer:
I have tried a different configurations but it always seems that the system is using 4 threads per core. The last configuration I used was the following:
If I access to the node were the wrf.exe is running I can see how many processes and threads I am using with the following commands:
# Processors:
top -bn1 -c | grep wrf.exe | grep -v srun | grep -v grep | wc -l
# Threads:
top -Hbn1 -c | grep wrf.exe | grep -v srun | grep -v grep | wc -l
And I get that I am using 20 processes but 80 threads...
Is there a way of controlling the number of threads used in wrf.exe simulations?
If you could provide some help to this problem it would be very helpful for me.
Thanks in advance,
Best,
Diego
I am using WRFV.4.4.2, compiled using dmpar option, and I am having difficulties running wrf.exe in parallel in the sense that it seems WRF is somehow multiplying by 4 the number of threads I want to use and so, sometimes the run hangs/frozen at this point:
taskid: 0 hostname: ac4-2018.bullx
module_io_quilt_old.F 2931 F
Quilting with 1 groups of 0 I/O tasks.
Ntasks in X 10 , ntasks in Y 12
Domain # 1: dx = 9000.000 m
Domain # 2: dx = 3000.000 m
WRF V4.4.2 MODEL
git commit 6233639c599119e76fca17dba9ea211af53a0ba9 4164 files changed, 233 insertions(+), 6 deletions(-)
*************************************
Parent domain
ids,ide,jds,jde 1 200 1 200
ims,ime,jms,jme -4 27 -4 24
ips,ipe,jps,jpe 1 20 1 17
*************************************
DYNAMICS OPTION: Eulerian Mass Coordinate
alloc_space_field: domain 1 , 36958448 bytes allocated
RESTART run: opening wrfrst_d01_2018-05-30_03:00:00 for reading
Input data is acceptable to use: wrfrst_d01_2018-05-30_03:00:00
The runs finally stops when the walltime of the job reaches its value. No errors appears. However, after contacting HPC support I got the following answer:
I do not understand how WRF decides how many threads to spawn, it seems to be ignoring any OpenMP settings you pass it, and it varies if you change number of tasks or number of nodes. With the 80 tasks on one node, WRF decides to use 4 threads per task, which exceeds the number of available cpus on the node.
I am pretty sure the problem was due to a race condition given by the uncontrolled placing of the threads on the different cores. The proper solution would be, if possible, to set up your WRF experiment in a way that you can actually configure both tasks and threads, so that you could make sure that each thread gets a CPU.
If you know how to set up WRF in such a way that you can force it to use a fixed number of tasks and threads per tasks, that would be the way to go. Then you ensure that the number of threads per task matches the "-c" (cpus-per-task) option in slurm/srun, and each thread will get a cpu.
I have tried a different configurations but it always seems that the system is using 4 threads per core. The last configuration I used was the following:
#!/bin/sh
#SBATCH --job-name=WRF_Ens_Memb
#SBATCH --qos=np
#SBATCH --nodes=1
#SBATCH --cpus-per-task=1
#SBATCH --ntasks-per-node=20
#SBATCH --ntasks=20
#SBATCH --time=00:30:00
#SBATCH --hint=nomultithread
module load prgenv/intel intel/2021.4.0 hpcx-openmpi/2.9.0
module load netcdf4/4.9.1
module load pnetcdf
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$NETCDF4_DIR/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$PNETCDF_DIR/lib
export OMP_NUM_THREADS=1
srun --ntasks=20 -c 1 --hint=multithread ./wrf.exe
If I access to the node were the wrf.exe is running I can see how many processes and threads I am using with the following commands:
# Processors:
top -bn1 -c | grep wrf.exe | grep -v srun | grep -v grep | wc -l
# Threads:
top -Hbn1 -c | grep wrf.exe | grep -v srun | grep -v grep | wc -l
And I get that I am using 20 processes but 80 threads...
Is there a way of controlling the number of threads used in wrf.exe simulations?
If you could provide some help to this problem it would be very helpful for me.
Thanks in advance,
Best,
Diego