Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Controlling number threads wrf.exe

DiegoS

New member
Dear WRF Support community,

I am using WRFV.4.4.2, compiled using dmpar option, and I am having difficulties running wrf.exe in parallel in the sense that it seems WRF is somehow multiplying by 4 the number of threads I want to use and so, sometimes the run hangs/frozen at this point:
taskid: 0 hostname: ac4-2018.bullx

module_io_quilt_old.F 2931 F

Quilting with 1 groups of 0 I/O tasks.

Ntasks in X 10 , ntasks in Y 12

Domain # 1: dx = 9000.000 m

Domain # 2: dx = 3000.000 m

WRF V4.4.2 MODEL

git commit 6233639c599119e76fca17dba9ea211af53a0ba9 4164 files changed, 233 insertions(+), 6 deletions(-)

*************************************

Parent domain

ids,ide,jds,jde 1 200 1 200

ims,ime,jms,jme -4 27 -4 24

ips,ipe,jps,jpe 1 20 1 17

*************************************

DYNAMICS OPTION: Eulerian Mass Coordinate

alloc_space_field: domain 1 , 36958448 bytes allocated

RESTART run: opening wrfrst_d01_2018-05-30_03:00:00 for reading

Input data is acceptable to use: wrfrst_d01_2018-05-30_03:00:00

The runs finally stops when the walltime of the job reaches its value. No errors appears. However, after contacting HPC support I got the following answer:

I do not understand how WRF decides how many threads to spawn, it seems to be ignoring any OpenMP settings you pass it, and it varies if you change number of tasks or number of nodes. With the 80 tasks on one node, WRF decides to use 4 threads per task, which exceeds the number of available cpus on the node.
I am pretty sure the problem was due to a race condition given by the uncontrolled placing of the threads on the different cores. The proper solution would be, if possible, to set up your WRF experiment in a way that you can actually configure both tasks and threads, so that you could make sure that each thread gets a CPU.
If you know how to set up WRF in such a way that you can force it to use a fixed number of tasks and threads per tasks, that would be the way to go. Then you ensure that the number of threads per task matches the "-c" (cpus-per-task) option in slurm/srun, and each thread will get a cpu.

I have tried a different configurations but it always seems that the system is using 4 threads per core. The last configuration I used was the following:

#!/bin/sh
#SBATCH --job-name=WRF_Ens_Memb
#SBATCH --qos=np
#SBATCH --nodes=1
#SBATCH --cpus-per-task=1
#SBATCH --ntasks-per-node=20
#SBATCH --ntasks=20
#SBATCH --time=00:30:00
#SBATCH --hint=nomultithread

module load prgenv/intel intel/2021.4.0 hpcx-openmpi/2.9.0
module load netcdf4/4.9.1
module load pnetcdf

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$NETCDF4_DIR/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$PNETCDF_DIR/lib

export OMP_NUM_THREADS=1
srun --ntasks=20 -c 1 --hint=multithread ./wrf.exe

If I access to the node were the wrf.exe is running I can see how many processes and threads I am using with the following commands:

# Processors:

top -bn1 -c | grep wrf.exe | grep -v srun | grep -v grep | wc -l

# Threads:
top -Hbn1 -c | grep wrf.exe | grep -v srun | grep -v grep | wc -l

And I get that I am using 20 processes but 80 threads...

Is there a way of controlling the number of threads used in wrf.exe simulations?

If you could provide some help to this problem it would be very helpful for me.

Thanks in advance,

Best,
Diego
 
Please upload ypur namelist.input for me to take a look.
I am suspicious that you specify nproc_x and nproc_y, which may cause some issues in your case.
 
Thanks Ming Chen for your quick response. Attached you will see the namelist.input I am using. It seems I did not specify nproc_x and nproc_y explicitly...
 

Attachments

  • namelist.input
    4.4 KB · Views: 2
Diego,

if you compile WRF in dmpar mode, then you should run with the mpi method. In this case, below is a sample command I would use :

mpirun -np 16 ./wrf.exe

This command will tell WRF to run with 16 processors and the domain will be automatically decomposed.

There is no need to set OMP_NUM_THREADS=1

Note that the mpirun command can be different depending on the machine you use. Please consult your computer manage about this.
 
Top