Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

mpirun does not seem to run geogrid in parallel

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

hconel

Member
Hi,
I am issuing the following command to run geogrid in parallel:

Code:
mpirun -np 4 ./geogrid.exe
but it seems like the execution is not parallelized, instead, geogrid runs 4 times separately. Because I see all of the output lines 4 times separately. For instance, the

Code:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!  Successful completion of geogrid.        !
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

message appears in 4 distinct places in the output.
Am I issuing the command correctly? Does geogrid.exe need an additional argument to run in parallel?
Thanks,
Huseyin
 
When you ran the WPS's "configure" script before compiling, did you select one of the "(dmpar)" options?
 
Yes. During compilation, I have chosen (3) for WPS:

Code:
   3.  Linux x86_64, gfortran    (dmpar)
and (34) for WRF:

Code:
 32. (serial)  33. (smpar)  34. (dmpar)  35. (dm+sm)   GNU (gfortran/gcc)

Also, the LIBRARY compilations were done without errors before that.
 
It may be possible that there's some issue with your MPI installation, or it may be a problem with the compilation of the WPS.

Could you try compiling and running the attached Fortran MPI test code?
Code:
mpif90 -o hello hello.F90
mpirun -np 4 ./hello
The program should print the following:
Code:
 Hello from task            1  of            4
 Hello from task            2  of            4
 Hello from task            3  of            4
 Hello from task            4  of            4

If it appears that the test MPI program is working, could you try cleaning, configuring, and compiling the WPS again (though at this point, I don't think there's a need to recompile the WRF model)? When compiling the WPS, could you save a compilation log and attach it to a new post in this thread?
Code:
./compile >& compile.log
 

Attachments

  • hello.F90
    751 bytes · Views: 34
Mr. Duda,
I've tested the hello.F90 code and it runs as expected and 4 cores print out the following:

Code:
 Hello from task            2  of            4
 Hello from task            3  of            4
 Hello from task            4  of            4
 Hello from task            1  of            4
I suspect that my environment variables might not be correct. I'm sharing the file that I always source before running WPS or WRF.
Code:
source set_wrf.sh
However, I have tested the hello.F90 code after sourcing the same file and it still worked correctly.
I'll try recompiling the whole WPS executables as soon as possible and give feedback.
Also, I happen to have saved the compilation log for the current WPS installation, which I'm also attaching in any case (there were no errors). Furthermore, I'm attaching the log of the LIBRARIES compilation, which also had no errors.
Thanks for your help,
Huseyin
 

Attachments

  • set_wrf.sh.txt
    532 bytes · Views: 35
  • log.compile_WPS.txt
    116.2 KB · Views: 28
  • log.make_LIBRARIES.txt
    864.6 KB · Views: 34
UPDATE: I found the problem. I noticed that I have changed my set_wrf file after compilation and changed the order of $PATH variables. The problem is solved when I prepend the binary directories under the LIBRARIES directory to the $PATH instead of appending them. I guess the executables were using my system MPI instead of the LIBRARIES MPI.
It was a simple mistake of me. Thanks for your time and sorry for bothering you Mr. Duda.
Regards,
Huseyin
 
Hello again,
When I run, say,

Code:
mpirun -np 10 ./geogrid.exe

a single output is written to the terminal (i.e. I see the
Code:
!  Successful completion of geogrid.        !

message once, not 10 times), but 10 of

Code:
geogrid.log.000#

files are printed (from 0 to 9) and all appear simultaneously (like a parallel run). But I'm not sure. Is that expected? Is the parallel run working normally? I suspect that I'm might be running 10 serial runs again.

Thanks,
Huseyin
 
Huseyin,
I believe this job is run successfully in parallel mode. It is expected that all the geogrid.log.000# files are produced at almost the same time for parallel run.
 
Top