jltorchinsky
New member
Hello,
I am attempting to run WRF on Cori at NERSC and am running into some issues. Here's the all of the relevant information I can think of:
WRF Version: 4.4 (c11bb76939647c4073e9a105ae00faaef55ca7fd)
Modules:
1) modules/3.2.11.4
2) darshan/3.3.1
3) craype-network-aries
4) gcc/11.2.0
5) craype/2.7.10
6) cray-mpich/7.7.19
7) craype-haswell
8) craype-hugepages2M
9) cray-libsci/20.09.1
10) udreg/2.3.2-7.0.3.1_3.16__g5f0d670.ari
11) ugni/6.0.14.0-7.0.3.1_6.4__g8101a58.ari
12) pmi/5.0.17
13) dmapp/7.1.1-7.0.3.1_3.21__g93a7e9f.ari
14) gni-headers/5.0.12.0-7.0.3.1_3.9__gd0d73fe.ari
15) xpmem/2.2.27-7.0.3.1_3.10__gada73ac.ari
16) job/2.2.4-7.0.3.1_3.17__g36b56f4.ari
17) dvs/2.12_2.2.224-7.0.3.1_3.14__gc77db2af
18) alps/6.6.67-7.0.3.1_3.21__gb91cd181.ari
19) rca/2.2.20-7.0.3.1_3.18__g8e3fb5b.ari
20) atp/3.14.9
21) perftools-base/21.12.0
22) PrgEnv-gnu/6.0.10
23) openmpi/4.1.2
24) cray-netcdf-hdf5parallel/4.8.1.1
To configure and build WRF, I followed the instructions located here. In particular, in the topmost directory, I ran
Within ./configure, I selected options 34 (dmpar for GNU) and 1 (basic for nesting). I've attached the compilation log in case it may provide any insights as to what may be going on. At the end, it says that the executables ideal.exe and wrf.exe have been successfully built, and sure enough they are in the main subdirectory.
I've attempted to run ideal.exe both in the run subdirectory and using an sbatch script in scratch space, but both give the same error:
I suspect that this is an incompatibility between the version of MPI on Cori and the version of MPI used in developing WRF. Can anybody diagnose this further and give advice on how to resolve this?
I am attempting to run WRF on Cori at NERSC and am running into some issues. Here's the all of the relevant information I can think of:
WRF Version: 4.4 (c11bb76939647c4073e9a105ae00faaef55ca7fd)
Modules:
1) modules/3.2.11.4
2) darshan/3.3.1
3) craype-network-aries
4) gcc/11.2.0
5) craype/2.7.10
6) cray-mpich/7.7.19
7) craype-haswell
8) craype-hugepages2M
9) cray-libsci/20.09.1
10) udreg/2.3.2-7.0.3.1_3.16__g5f0d670.ari
11) ugni/6.0.14.0-7.0.3.1_6.4__g8101a58.ari
12) pmi/5.0.17
13) dmapp/7.1.1-7.0.3.1_3.21__g93a7e9f.ari
14) gni-headers/5.0.12.0-7.0.3.1_3.9__gd0d73fe.ari
15) xpmem/2.2.27-7.0.3.1_3.10__gada73ac.ari
16) job/2.2.4-7.0.3.1_3.17__g36b56f4.ari
17) dvs/2.12_2.2.224-7.0.3.1_3.14__gc77db2af
18) alps/6.6.67-7.0.3.1_3.21__gb91cd181.ari
19) rca/2.2.20-7.0.3.1_3.18__g8e3fb5b.ari
20) atp/3.14.9
21) perftools-base/21.12.0
22) PrgEnv-gnu/6.0.10
23) openmpi/4.1.2
24) cray-netcdf-hdf5parallel/4.8.1.1
To configure and build WRF, I followed the instructions located here. In particular, in the topmost directory, I ran
Code:
./configure
./compile em_b_wave &> log.compile
Within ./configure, I selected options 34 (dmpar for GNU) and 1 (basic for nesting). I've attached the compilation log in case it may provide any insights as to what may be going on. At the end, it says that the executables ideal.exe and wrf.exe have been successfully built, and sure enough they are in the main subdirectory.
I've attempted to run ideal.exe both in the run subdirectory and using an sbatch script in scratch space, but both give the same error:
Code:
> mpirun -np 4 ideal.exe
starting wrf task 3 of 4
starting wrf task 2 of 4
starting wrf task 0 of 4
starting wrf task 1 of 4
[cori03:13704] *** An error occurred in MPI_Comm_create_keyval
[cori03:13704] *** reported by process [1165492225,0]
[cori03:13704] *** on communicator MPI_COMM_WORLD
[cori03:13704] *** MPI_ERR_ARG: invalid argument of some other kind
[cori03:13704] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[cori03:13704] *** and potentially your MPI job)
I suspect that this is an incompatibility between the version of MPI on Cori and the version of MPI used in developing WRF. Can anybody diagnose this further and give advice on how to resolve this?