Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

forrtl: severe (174): SIGSEGV, segmentation fault with WRF compiled using new Intel compilers (icx, ifx)

ArminH

New member
Hello,

I was able to compile WPS and WRF 4.6.0 using new Intel compilers. To ensure that everything is working properly, I attempted to run a real case that I had previously run with the gfortran compiled WRF model. Geogrid.exe, ungrib.exe, metgrid.exe, and real.exe all worked successfully, and I was able to generate the wrfinput and wrfbdy files, which appear to be correct. However, when I attempted to run wrf.exe, I received the error message "forrtl: severe (174): SIGSEGV, segmentation fault occurred" after Timing for processing lateral boundary for domain 1. I have attached the namelist.input and rsl.error.0000 file for reference. Any help would be appreciated.

Best regards,
Armin
 

Attachments

  • namelist.input
    3.7 KB · Views: 2
  • rsl.error.0000
    5.2 KB · Views: 1
Thanks for the reply. Yes, I have exported all the paths to the libraries and sourced the intel environment to compile the code. Here is all exports:

source /opt/intel/oneapi/setvars.sh

export CC=icx
export CXX=icpx
export FC=ifx
export F77=ifx
export F90=ifx
export MPIFC="mpiifort -fc=ifx"
export MPIF77="mpiifort -fc=ifx"
export MPIF90="mpiifort -fc=ifx"
export MPICC="mpiicc -cc=icx"
export MPICXX="mpiicpc -cxx=icpx"

export LDFLAGS="-L/home/wrf/wrf_libs_intel/lib"
export CPPFLAGS="-I/home/wrf/wrf_libs_intel/include"
export JASPERLIB=/home/wrf/wrf_libs_intel/lib
export JASPERINC=/home/wrf/wrf_libs_intel/include
export NETCDF=/home/wrf/wrf_libs_intel
export LD_LIBRARY_PATH=/home/wrf/wrf_libs_intel/lib:$LD_LIBRARY_PATH
 
Bash:
source /opt/intel/oneapi/setvars.sh
#setting environment for WRF model to run
export LD_LIBRARY_PATH=/home/workhorse/WRF_Intel/Libs/NETCDF/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/home/workhorse/WRF_Intel/Libs/grib2/lib:$LD_LIBRARY_PATH
export PATH=/home/workhorse/WRF_Intel/Libs/grib2/lib:$PATH

shouldn't need to export the compiler information because when you run source setvars.sh it does all that for you in the environment variables

These are my specific ones for when I use the new intel compilers. Maybe this can help you?


how many cores are you using for the model run? mpirun -np ?? ./wrf.exe
 
Thanks for the reply. Yes, I have exported all the paths to the libraries and sourced the intel environment to compile the code. Here is all exports:

source /opt/intel/oneapi/setvars.sh

export CC=icx
export CXX=icpx
export FC=ifx
export F77=ifx
export F90=ifx
export MPIFC="mpiifort -fc=ifx"
export MPIF77="mpiifort -fc=ifx"
export MPIF90="mpiifort -fc=ifx"
export MPICC="mpiicc -cc=icx"
export MPICXX="mpiicpc -cxx=icpx"

export LDFLAGS="-L/home/wrf/wrf_libs_intel/lib"
export CPPFLAGS="-I/home/wrf/wrf_libs_intel/include"
export JASPERLIB=/home/wrf/wrf_libs_intel/lib
export JASPERINC=/home/wrf/wrf_libs_intel/include
export NETCDF=/home/wrf/wrf_libs_intel
export LD_LIBRARY_PATH=/home/wrf/wrf_libs_intel/lib:$LD_LIBRARY_PATH


Looking at your namelist there are a few things to change too

Code:
&time_control
 run_days                            = 0,
 run_hours                           = 600000,
 run_minutes                         = 0,
 run_seconds                         = 0,
 start_year                          = 2024
 start_month                         = 05
 start_day                           = 21
 start_hour                          = 00
 end_year                            = 2024
 end_month                           = 05
 end_day                             = 22
 end_hour                            = 00

&time_control
run_days = 0,
run_hours = 24,
run_minutes = 0,
run_seconds = 0,
start_year = 2024
start_month = 05
start_day = 21
start_hour = 00
end_year = 2024
end_month = 05
end_day = 22
end_hour = 00
 
Code:
&domains
 time_step                           = 30,

&domains
time_step = 54,

and maybe this
e_vert = 61
e_vert = 45
@ArminH

Lastly, do these suggested changes one at a time to see if anything makes a difference. If it fails in the /run folder zip a file with all the rsl.out and rsl.error files and upload it here so we can look at it. sometimes the error finds itself in one of many of the rsl files.

to quickly find the errors you can run these commands in the terminal that will help find them.

Code:
grep -i FATAL rsl.*


grep -i error rsl.*


grep -i SIGSEGV rsl.*


grep -i cfl rsl.*
 
I will attempt to clean and compile the code without exporting the compiler information. I have examined the number of cores from 1 to 20, but have not had any success.
 
It seems to me that the issue lies with the code compilation rather than the settings in namelist.input. It's worth noting that I was able to successfully run the simulation using the GCC-compiled WRF model. To simplify the problem, I opted to compile the code for em_les and test it on a case in WRF/test/em_les. Unfortunately, I encountered the same segmentation fault error prior to initiating the time steps. My assumption is that one of the necessary libraries may not have been installed correctly. I will conduct further investigation into the issue and ensure that all dependencies are properly installed. I will keep you updated on my progress.
 
I am still getting the segmentation fault error for the em_les test case. I also compiled the code without exporting the compiler information, and it didn't solve the issue. I have attached all the necessary information. As can be seen in installed_libraries.txt, all the required libraries are correctly installed. The compilation was done without any errors. I set ulimit -s to unlimited. Any help would be appreciated.
 

Attachments

  • logs.tar
    300 KB · Views: 2
Hi @ArminH
If WRF compiled correctly, then the compilation, itself, should not be the issue. However, different compilers can sometimes detect different issues. That being said, you're running the default em_les case (based on the namelist.input file you shared), which, in theory, should work without any problems. I also notice that the model stops immediately without even processing anything. When you ran ideal.exe, how many processors did you use? If it was more than 1, can you go back and try to rerun ideal.exe with only a single processor? And then you can use more when running wrf.exe. Let me know if that changes anything.

And one more question: you mentioned you were able to run the real-data case fine with gcc/GNU compilers. Besides the compilers, was that case 100% identical to the case you tried with Intel compilers? i.e., the exact same domain, dates, input data, physics options, etc.
 
Hi @kwerner
Thanks for your response. I had used only 1 processor to run ideal.exe. The answer to your question regarding the real-data case with gcc/GNU compilers is yes. The case was 100% identical in terms of the domain, dates, input data, physics, etc.
 
Thanks. Since you are just running the default em_les case, I just tested that case, using Intel compilers and it runs without problems. For reference, I'm using ifort 2021.10.0.

My only other thoughts are that you either don't have the space to output any files in the directory where you're trying to run, or there is something wrong with your environment settings. In either case, I'd suggest seeking the help from a systems administrator at your institution to see if they are able to help.
 
Is ifort 2021.10.0 the classic Intel compiler that is included as a component of the Intel Parallel Studio XE suite of tools? I am currently using the Intel oneAPI compilers (icx, icpx, ifx) as part of the Intel oneAPI HPC Toolkit. Please refer to the attached document for more information.
 

Attachments

  • intel_oneAPI_compilers.txt
    3 KB · Views: 1
Is ifort 2021.10.0 the classic Intel compiler that is included as a component of the Intel Parallel Studio XE suite of tools? I am currently using the Intel oneAPI compilers (icx, icpx, ifx) as part of the Intel oneAPI HPC Toolkit. Please refer to the attached document for more information.
@ArminH Could you type out the exact steps and commands you are using to run this case and the namelist files? I have OneAPI installed and I can try to recreate it @kwerner
 
Top