Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Segmentation fault when running freshly installed WRF

sveinngauti

New member
I just installed WRF on a new computer. The computer has Intel i9 processor and is running Ubuntu 20.04. I am using Intel compilers from oneapi and WRF 4.4

I am trying to run a domain over Iceland, 9km using GFS data, but I get segmentation fault only seconds after it starts. WPS and real.exe run succesfully.

The error I'm getting is this one below:
d01 2022-09-09_18:00:00 module_io.F: in wrf_read_field
inc/wrf_bdyin.inc ext_read_field PC_BTYE memorder YE Status = 0
d01 2022-09-09_18:00:00 input_wrf: end, fid = 2
Timing for processing lateral boundary for domain 1: 0.00633 elapsed seconds
d01 2022-09-09_18:00:00 module_integrate: calling solve interface
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
wrf.exe 0000000003182AEA for__signal_handl Unknown Unknown
libc.so.6 00007FA6F461A520 Unknown Unknown Unknown
wrf.exe 00000000016978C4 Unknown Unknown Unknown
wrf.exe 00000000014F1EB8 Unknown Unknown Unknown
wrf.exe 00000000005B664B Unknown Unknown Unknown
wrf.exe 0000000000415251 Unknown Unknown Unknown
wrf.exe 000000000041520F Unknown Unknown Unknown
wrf.exe 00000000004151A2 Unknown Unknown Unknown
libc.so.6 00007FA6F4601D90 Unknown Unknown Unknown
libc.so.6 00007FA6F4601E40 __libc_start_main Unknown Unknown
wrf.exe 00000000004150A5 Unknown Unknown Unknown

My met files look normal and so does the wrfinput file. I've tried changing all sorts of varibles in namelist.input, but always run into the same error. I have also looked for the same error online, but haven't found anything.

My other computer, running wrf 3.8 with gnu compilers runs this same forecast without errors. Does anyone have any idea what might cause this error? I've attached my namelist files
 

Attachments

  • namelist.input.txt
    2.8 KB · Views: 1
  • namelist.wps.txt
    765 bytes · Views: 1
  • rsl.error.0000.txt
    435.7 KB · Views: 1
Can I see your config log for WRF and WPS?

Also for intel compilers which packages did you install?
 
I have attached the configure logs. I installed the Intel basekit as well as the intel hpc kit.
 

Attachments

  • configure.wps.txt
    3.3 KB · Views: 2
  • configure.wrf.txt
    23.5 KB · Views: 2
In WRF configure log:
#
# If you have Intel MPI installed and wish to use instead, make the
# following changes to settings below:
# DM_FC = mpiifort
# DM_CC = mpiicc
# and source bin64/mpivars.sh file from your Intel MPI installation
# before the build.

DESCRIPTION = INTEL ($SFC/$SCC)
DMPARALLEL = 1
OMPCPP = # -D_OPENMP
OMP = # -qopenmp -fpp -auto
OMPCC = # -qopenmp -fpp -auto
SFC = ifort
SCC = icc
CCOMP = icc
DM_FC = mpif90 -f90=$(SFC)
DM_CC = mpicc -cc=$(SCC)
FC = time $(DM_FC)

In WPS configure log:
COMPRESSION_LIBS = -L/home/sveinn/wrf_libs_intel/lib -ljasper -lpng -lz
COMPRESSION_INC = -I/home/sveinn/wrf_libs_intel/include
FDEFS = -DUSE_JPEG2000 -DUSE_PNG
SFC = ifort
SCC = icc
DM_FC = mpif90
DM_CC = mpicc

So you'll need to change the code so that these are the correct exports.

Bash:
# some of the libraries we install below need one or more of these variables
export CC=icc
export CXX=icpc
export FC=ifort
export F77=ifort
export F90=ifort
export MPIFC=mpiifort
export MPIF77=mpiifort
export MPIF90=mpiifort
export MPICC=mpiicc
export MPICXX=mpiicpc

For WRF
Bash:
# Removing user input for configure.  Choosing correct option for configure with Intel compilers
sed -i '420s/<STDIN>/15/g' $HOME/HWRF/WRF/WRF-4.3.3/arch/Config.pl

CC=$MPICC FC=$MPIFC CXX=$MPICXX F90=$MPIF90 F77=$MPIF77 ./configure # option 15

# Need to remove mpich/GNU config calls to Intel config calls
sed -i '170s|mpif90 -f90=$(SFC)|mpiifort|g' $HOME/HWRF/WRF/WRF-4.3.3/configure.wrf
sed -i '171s|mpicc -cc=$(SCC)|mpiicc|g' $HOME/HWRF/WRF/WRF-4.3.3/configure.wrf

./compile -j $CPU_HALF_EVEN nmm_real |& tee wrf.nmm.log

export WRF_DIR=$HOME/HWRF/WRF/WRF-4.3.3

For WPS
Bash:
# Removing user input for configure.  Choosing correct option for configure with Intel compilers
sed -i '141s/<STDIN>/19/g' $HOME/HWRF/WRF/WPS-4.3.1/arch/Config.pl

./configure -D #Option 19 for Intel and distributed memory

sed -i '65s|mpif90|mpiifort|g' $HOME/HWRF/WRF/WPS-4.3.1/configure.wps
sed -i '66s|mpicc|mpiicc|g' $HOME/HWRF/WRF/WPS-4.3.1/configure.wps

Depending if you compiled all the libraries with gnu you may need to go back and re-install the libraries with intel.
 
Thank you for all your help Will. All the libraries were compiled with intel compilers. I've compiled everyting according to your instructions. The ain change was to go from mpif90 to mpiifort and from mpicc to mpiicc.

I am still getting an error just after starting wrf. It happens in a slightly different place now though. This is the segmentation fault I'm currently getting:
ThompMP: read qr_acr_qgV3.dat instead of computing
ThompMP: read qr_acr_qsV2.dat instead of computing
ThompMP: read freezeH2O.dat instead of computing
Timing for Writing wrfout_d01_2022-09-09_18:00:00 for domain 1: 0.12708 elapsed seconds
d01 2022-09-09_18:00:00 Input data is acceptable to use: wrfbdy_d01
Timing for processing lateral boundary for domain 1: 0.00356 elapsed seconds
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
wrf.exe 0000000003182AEA for__signal_handl Unknown Unknown
libc.so.6 00007F0525C1A520 Unknown Unknown Unknown
wrf.exe 00000000016978C4 Unknown Unknown Unknown
wrf.exe 00000000014F1EB8 Unknown Unknown Unknown
wrf.exe 00000000005B664B Unknown Unknown Unknown
wrf.exe 0000000000415251 Unknown Unknown Unknown
wrf.exe 000000000041520F Unknown Unknown Unknown
wrf.exe 00000000004151A2 Unknown Unknown Unknown
libc.so.6 00007F0525C01D90 Unknown Unknown Unknown
libc.so.6 00007F0525C01E40 __libc_start_main Unknown Unknown
wrf.exe 00000000004150A5 Unknown Unknown Unknown

I've attached my configure files and the rsl.error file. The namelist files are the same. Do you have any idea what could cause this?
 

Attachments

  • rsl.error.0000.txt
    3.7 KB · Views: 4
  • configure.wps.txt
    3.3 KB · Views: 3
  • configure.wrf.txt
    23.5 KB · Views: 3
This is where I get a little less comfortable with the WRF. I know the installation issues pretty well but debugging past that I get a little iffy.

First question is to you have all the .exe files installed and they are not broken links?

Next can you please include your WRF & WPS namelist.input. My guess based on reading other posts in the forum is that there is an error in the namelist and that is causing the program to break.

Now in the ocnfigure.wrf file I saw this:

# ESMFINCLUDEGOESHERE


#### NETCDF4 pieces

NETCDF4_IO_OPTS = -DUSE_NETCDF4_FEATURES -DWRFIO_NCD_LARGE_FILE_SUPPORT
GPFS =
CURL =
HDF5 =
ZLIB =
DEP_LIB_PATH =
NETCDF4_DEP_LIB = $(DEP_LIB_PATH) $(HDF5) $(ZLIB) $(GPFS) $(CURL)

# NETCDF4INCLUDEGOESHERE

#### CTSM pieces

# CTSMINCLUDEGOESHERE

I am not sure if the HDF5 library and ZLIB library needs to be here. I would think it does but I am not certain. Please make sure that both of these libraries are in your LD_PATH and PATH variables.

I'm sure one of the admins @kwerner or @Ming Chen will see this and verify if I am correct or not.
 
Last edited by a moderator:
I set both HDF5 and ZLIB and recompiled everything. Still getting the same segmentation fault. I wonder if it could be something changed with the namelist since v 4.0. Using the same input gribs and namelists it runs smoothly on another computer running wrf 4.0 using gnu compilers.
 
You mentioned you were using WRF 3.8 with gnu in your first message. Are using the same geog files from the 3.8 version for the 4.0 version?
 
I'm running geogrid on the new install, so the geog files are the same version. But it looks like it might be an issue with the landuse date, or that's my best guess. I've tried running with different landuse, but still get the same error. I'm still not sure if this is installation issue or if it is a namelist namelist issue. I'm still completely lost.
 
Okay, just for grins double check that these 4.0 files are all in the folder where you have your geog files.

Sadly I have reached that limit of my knowledge. I do apologize. Hopefully the admins will get online later today and take a look at this in more depth.
I'm still learning all the ins and outs of namelist files and I don't want to make it worse.
 
@sveinngauti,
Do you have access to multiple processors? I see you've compiled the code for the distributed memory option (dmpar), which should allow you to run on multiple processors. If so, can you try to run with more than a single processor? Try 4 or 8 to see if that makes any difference. Please also set debug_level = 0. This is something we have removed from recent default namelists because it's not very helpful and ends up adding a lot of junk to the rsl* files, making them unreadable and very large.

This is unrelated, but another recommendation is to increase the size of your domain. We advise to have domains no smaller than 100x100 to be able to resolve reasonable results. You may find this best practices page helpful.
 
@kwerner thank you for the assistance. Compiling for dmpar was a mistake. I've only got 1 cpu. I've recompiled the code with smpar, and still get the segmentation fault just after I start it up. I'v tried changing the domain so it's bigger than 100x100 but I get the same error. Could this be do to the physics scheme or is it more likely that it's due to some installation error?
 
Is there a test case somewhere with met_em files and a namelist.input I can try tu run in order to figure out if it's the installation that's not working or if it has something to do with my special run?
 
Instead of compiling with the 'smpar' option, can you try a serial option and then just run with the command
./wrf.exe >& wrf.og

If it's still failing, you can try to simplify the case for troubleshooting purposes. I recommend using the default namelist.input file that comes with the wrf code and then only modifying the domain and time information. Don't add any extra options like adaptive time step (for e.g.) and don't modify any of the physics options. If it runs, then you'll know it's one of the options you've added/modified in the namelist and you can slowly make modifications to try to determine which one makes it fail.

If it still fails with the default namelist, I think it's likely due to the number of processors you're using. If you send me your updated namelist.input, wrfbdy_d01 and wrfinput_d01 files, I can try to test them for you. If they are too large to attach to your post, see the home page of the forum for information on sharing large files. Thanks!
 
Top