Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Metgrid run with dmpar (parallel run)

syyang

New member
I am running WRF using WPS 4.4 version and would like to run metgrid using dmpar.

As a result of searching through previous posts, I saw many people recommending serial because its performance is not bad.
However, since we are trying to reduce the WRF execution time as much as possible,
we are trying to shorten the execution by even a few minutes by executing metgrid with dmpar.

I am using gfortran and openmpi in aarch64 environment.

If i run "mpirun -np 4 ./real.exe", it will run normally in parallel, but if i run "mpirun -np 4 metgrid.exe", it appears that it will not run in parallel.

I checked with the top command, 4 CPUs are actually in use, but the execution time increases more than when running metgrid in serial,
and the metgrid.log.0000 metgrid.log.0001 metgrid.log.0002 metgrid.log.0003 files are It displays the same log with different times.

1. I would like to check whether my metgrid.exe is running normally in parallel.
2. If parallel execution is not being performed properly, please tell me how to compile it to enable parallel execution.

Please Help.

ubuntu$ mpirun -np 4 ./metgrid.exe
Processing domain 1 of 2
Processing 2024-05-19_00
GFS
Processing 2024-05-19_01
GFS
Processing 2024-05-19_02
GFS
Processing 2024-05-19_03
 

Attachments

  • configure.wps
    3.7 KB · Views: 2
  • metgrid.log.0000.txt
    2.9 MB · Views: 1
  • metgrid.log.0001.txt
    2.9 MB · Views: 1
  • metgrid.log.0002.txt
    2.9 MB · Views: 1
  • metgrid.log.0003.txt
    2.9 MB · Views: 1
Unfortunately I'm not able to say why it's taking longer to use dmpar for metgrid. I do believe it's working properly, though. Typically the log/error files per processor are very similar to each other, so it's hard to tell which part of the domain they are working to calculate.

How large are your domains? One of the reasons we typically recommend using serial processing for WPS is because testing has shown that it doesn't do much to improve the time to run geogrid and metgrid because they run so quickly anyway. The only reason we ever recommend using dmpar for them is if your domains are very large (thousands x thousands of grid spaces) and sometimes a single processor just isn't able to handle domains that large.
 
Thank you for answer.

My domain is d01 = 120 x 152, d02 = 351 x 601.
Although the domain is not very large, it currently takes 13 minutes to run metgrid.
To use this data, we need a way to reduce even 3-4 minutes of the 13 minutes.
 
I suppose you could try more processors, but if it still doesn't make any difference, I'm not sure of any other way to reduce that time, unfortunately.
 
Hi.

Checking the configure.wps file, maybe you could try another build with the openmp flags (-fopenmp I think are for clang/flang... in intel compilers are "-qopenmp -fpp -auto")

SFC = armflang -fopenmp
SCC = armclang -fopenmp

and the FFLAGS, FCFLAGS, CFLAGS should have:
  • -O2 or -O3
  • -ffast-math
  • -march=native if you'll run in the same machine architecture you're building the binaries.

Hope this works for speeding up a bit the process.
 
@syyang,
I wonder whether you compiled WPS in dmpar? If so, then your command "mpirun -np 4 ./real.exe" is correct for running metgrid.exe with 4 processors, and this should speed up your process.
 
Top