Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Compiling WRF-4.5.2 with Intel OneAPI on Derecho

jaredlee

New member
I'm trying to compile WRF-4.5.2 on Derecho using the Intel OneAPI compilers. (I've already gotten it to compile fine using the intel-classic/2023.2.1 compilers, but I'm wondering why OneAPI isn't working.) I'm getting very similar compilation failures whether I use intel-oneapi/2023.2.1 or intel-oneapi/2024.0.2, so for simplicity I'll only focus on the first one. As the errors I'm getting are identical, my hope is that the same fix could be applied with either intel-oneapi module that's currently available on Derecho.

Here's my module environment:

Code:
jaredlee@derecho7:~/programs/WRF-4.5.2-oneapi-2023.2.1> module list

Currently Loaded Modules:
  1) ncarenv/23.09 (S)   5) cdo/2.3.0               9) cray-mpich/8.1.27    13) proj/8.2.1
  2) craype/2.7.23       6) conda/latest           10) ncarcompilers/1.0.0  14) geos/3.9.1
  3) nco/5.1.9           7) madis/4.5              11) mkl/2023.2.0         15) hdf5/1.12.2
  4) ncview/2.1.9        8) intel-oneapi/2023.2.1  12) eccodes/2.25.0       16) netcdf/4.9.2

When I ran configure, I selected option 78 (dmpar) for "INTEL (ifx/icx) : oneAPI LLVM". And here's a snippet from the first error in the compilation log (I've attached the full file, along with my configure.wrf):

Code:
time mpif90 -f90=ifx -c -real-size `expr 8 \* 4` -i4  -O0 -fno-inline -no-ip -ip -fp-model precise -w -ftz -align all -fno-alias -FR -convert big_endian   -I../dyn_em  -I/glade/u/home/jaredlee/programs/WRF-4.5.2-oneapi-2023.2.1/external/esmf_time_f90  -I/glade/u/home/jaredlee/programs/WRF-4.5.2-oneapi-2023.2.1/main -I/glade/u/home/jaredlee/programs/WRF-4.5.2-oneapi-2023.2.1/external/io_netcdf -I/glade/u/home/jaredlee/programs/WRF-4.5.2-oneapi-2023.2.1/external/io_int -I/glade/u/home/jaredlee/programs/WRF-4.5.2-oneapi-2023.2.1/frame -I/glade/u/home/jaredlee/programs/WRF-4.5.2-oneapi-2023.2.1/share -I/glade/u/home/jaredlee/programs/WRF-4.5.2-oneapi-2023.2.1/phys -I/glade/u/home/jaredlee/programs/WRF-4.5.2-oneapi-2023.2.1/wrftladj -I/glade/u/home/jaredlee/programs/WRF-4.5.2-oneapi-2023.2.1/chem -I/glade/u/home/jaredlee/programs/WRF-4.5.2-oneapi-2023.2.1/inc -I/glade/u/apps/derecho/23.09/spack/opt/spack/netcdf/4.9.2/oneapi/2023.2.1/yzvj/include   module_alloc_space_1.f90
ifx: error #10106: Fatal error in /glade/u/apps/common/23.08/spack/opt/spack/intel-oneapi-compilers/2023.2.1/compiler/2023.2.1/linux/bin-llvm/xfortcom, terminated by kill signal
compilation aborted for module_alloc_space_1.f90 (code 1)

real    0m5.734s
user    0m4.081s
sys 0m1.608s
make[2]: [../configure.wrf:546: module_alloc_space_1.o] Error 1 (ignored)

The compilation fails with identical ifx "terminated by kill signal" errors for module_alloc_space_1.f90, module_alloc_space_0.f90, module_alloc_space_6.f90, module_alloc_space_7.f90, module_domain.f90, and module_dm.f90, which then cause other downstream compilation errors.

Does anyone have ideas for how to get WRF to compile using OneAPI compilers on Derecho? I had hoped it would work, since there was a PR to allow OneAPI compilers to work in 4.5.2 (add compilation stanza for Intel oneAPI for 4.5.2 (#1946) · wrf-model/WRF@45215e0), but I assume there's probably something I'm doing wrong.

Jared
 

Attachments

  • compile.log.opt78.dmpar.intel-oneapi-2023.2.1.txt
    4.4 MB · Views: 1
  • configure.wrf.txt
    21.1 KB · Views: 9
Hi Jared,

I just ran a test compile of V4.5.2 using OneAPI on Derecho and mine compiled successfully. Our configure logs are identical, so it must be our environment differences that is causing the problem. These are my settings:

Code:
Currently Loaded Modules:
  1) ncarenv/23.09 (S)   3) intel/2023.2.1        5) cray-mpich/8.1.27   7) netcdf/4.9.2   9) ncl/6.6.2
  2) craype/2.7.23       4) ncarcompilers/1.0.0   6) hdf5/1.12.2         8) ncview/2.1.9

If you want to take a look in the directory where I compiled it, you can find it in /glade/derecho/scratch/kkeene/jaredlee/WRFV4.5.2

You're also welcome to copy over that compiled version, if it would be helpful. If you are still having trouble, I'd recommend reaching out to the CISL support group to see if they can figure out what about your environment is causing the issue.
 
Hi Karl,

Thanks. Upon switching from intel-oneapi/2023.2.1 to intel/2023.2.1, but still using compile option 78 (dmpar, oneapi, icx/ifx), I was able to get WRF to compile. I was also able to get it to compile after switching from intel-oneapi/2024.0.2 to intel/2024.0.2. I have no idea why the intel-oneapi modules are somehow preventing WRF from compiling on Derecho (even though the regular intel modules use oneapi compilers), but I'll let someone else go down that rabbit hole if they want to. What I have now is sufficient for my purposes, and hopefully this thread saves others a little bit of headache in the future.

Jared
 
Posting for posterity:
The new ifx compiler takes a TON of memory when compiling some of the larger files (module_alloc_space*, module_dm, module_domain, etc) and this error explicitly comes back with an internal compiler error "#10106 ... xfortcom, terminated by kill signal". This is the system killing the program for taking too much memory.

Attached is an example of this happening just before my computer kicked the bucket for said lack of memory:
Screenshot from 2024-03-20 11-06-54.png

If possible, the interim solution should be to recommend increasing available memory (RAM or swap) if the user is able to. Second to that, on systems like Derecho, one can compile on a compute node for more access to memory vs the login nodes. Compiling on the login nodes is possible but sporadic and highly dependent on memory available at that moment.
 
Thanks, Anthony! That explanation is helpful. Hopefully intel can corral this memory-hogging issue to make it more manageable with future compiler versions...

Jared
 
Top