Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

mpi error when running idealized WRF (v4.2.1) on Derecho

pfinocchio

New member
Hello,

I'm trying to run idealized simulations (em_quarter_ss) using WRF4.2.1 for the first time on Derecho. This version of the model ran on Cheyenne (SGI_MPT) and the code has not changed since porting it over to Derecho.

I compiled WRF successfully using option 50 (Intel Cray XC dmpar). However, when I try to run the model, it immediately crashes and each core throws the following error:
symbol lookup error:...undefined symbol: mpi_initialized.

Clearly something very basic is wrong with how I'm running the model. I updated my pbs job scripts based on the online documentation for running mpi jobs on Derecho, so I'm not quite sure what the issue is. I also compiled with option 15 and had the same issue.

I've attached configure.wrf, compile log, the namelist, the pbs job script, and the pbs log file containg the errors.

The modules I load in the job script and during compile time are:
module --force purge
module load ncarenv/23.06
module load intel-classic
module load ncarcompilers
module load cray-mpich
module load craype
module load netcdf-mpi

Any help is much appreciated.

Thanks!
 

Attachments

  • namelist.input
    7.2 KB · Views: 0
  • pbsjob.log
    7.2 KB · Views: 1
  • compile.txt
    852.4 KB · Views: 1
  • configure.txt
    21 KB · Views: 0
  • run_wrf.job.txt
    494 bytes · Views: 0
Hi,
Do you mind pointing us to your WRF running directory on Derecho so we can take a look? Thanks.
 
Thanks! I did a test with your input files, namelist, etc., but with a version of WRF I compiled on derecho. I was able to run it without any issues (I only ran about 4 hours before I stopped it). I'm not certain if this has anything to do with it, but I don't think you should have 2 types of MPI modules loaded. These are the modules I have loaded:

Code:
Currently Loaded Modules:
  1) ncarenv/23.09 (S)   3) intel/2023.2.1        5) cray-mpich/8.1.27   7) netcdf/4.9.2   9) ncl/6.6.2
  2) craype/2.7.23       4) ncarcompilers/1.0.0   6) hdf5/1.12.2         8) ncview/2.1.9

And I chose Intel, option 50 when I compiled. If you're interested in seeing (or even using) my compiled version and the test I conducted, you can find it at /glade/derecho/scratch/kkeene/pfinocchio/wrfv4.5.2.
 
Thank you for your help. I tried running a small test case by linking all of the files in the run directory to a test run directory on my scratch drive (/glade/derecho/scratch/pfin/test), and it runs successfully. So it's good to know I can at least run the model.

However, I tried re-running my case with the same modules loaded and it fails with the same mpi_initialize errors. So my guess is that the mpi_intialize error relates to how I compiled WRF, rather than the modules I have loaded when I run WRF. I need to continue using v4.2.1 because there are specific alterations to the WRF code for my idealized simulations that have only been tested in v4.2.1.

I've tried re-compiling my version of WRF with more standard modules. However, the option 50 compile fails with the both of the following default module sets (compile logs attached):
latest ncar env with default intel compilers
1) ncarenv/23.09 (S) 3) intel/2023.2.1 5) cray-mpich/8.1.27 7) netcdf/4.9.2 9) ncl/6.6.2
2) craype/2.7.23 4) ncarcompilers/1.0.0 6) hdf5/1.12.2 8) ncview/2.1.9

previous ncar env with default intel compilers
1) ncarenv/23.06 (S) 3) intel/2023.0.0 5) cray-mpich/8.1.25 7) netcdf/4.9.2 9) ncl/6.6.2
2) craype/2.7.20 4) ncarcompilers/1.0.0 6) hdf5/1.12.2 8) ncview/2.1.8

Could you please share the modules you have loaded when you compiled your test version of WRF4.5.2 with option 50? Based on the compile log in your directory, it looks like you used ncarenv v23.06 (not 23.09)

Thank you again!
 

Attachments

  • compile.tvpds.option50v23.06.txt
    868.2 KB · Views: 0
  • compile.tvpds.option50v23.09.txt
    868 KB · Views: 0
You're actually right. It looks like the version I compiled (I compiled it a while back, but just copied it to the /glade/scratch/kkeene/pfinocchio directory) was compiled with an older version of ncarenv. I also just realized that I'm pretty sure I compiled with intel-classic, instead of just intel.

I just tested again and I recompiled wrfv4.2.1 from scratch, using the following set-up:


Code:
Currently Loaded Modules:

  1) ncarenv/23.09 (S)   3) hdf5/1.12.2    5) ncview/2.1.9   7) cray-mpich/8.1.27     9) intel-classic/2023.2.1
  2) craype/2.7.23       4) netcdf/4.9.2   6) ncl/6.6.2      8) ncarcompilers/1.0.0

I was able to compile V4.2.1, and to run your case there. Furthermore, I compiled your specific version of V4.2.1 that I grabbed from
/glade/u/home/pfin/WRF-4.2.1/ in case there were modifications you made that may be causing compiling issues. I used the same loaded modules as above and it also compiled and ran without any problems. So maybe it's the "intel-classic/2023.2.1" module that needs to be loaded before you compile, along with the others I have set. Can you give that a try and see if it makes a difference?
 
The original version of the model (v4.2.1 with code changes specific to this idealized TC setup) now compiles and runs using the module set you provided above. The solution was simply swapping out the default intel compiler (intel/2023.2.1) with the intel-classic compiler (intel-classic/2023.2.1).

Thanks very much for your help!
 
Top