Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

wrf.exe forrtl: severe (174): SIGSEGV, segmentation fault occurred

abidinz

New member
CentOS 8.3.2011
WRF 4.4.1
Intel Compiler 2021.5.0 (icc,ifort,mpifort,mpiicc)

I submit the job in slurm with 1 node and WRF is running,
but when I use 2 nodes the error below raises and WRF stop running.

I have tried with WRF 4.4.1 and 4.3.3 with combination WRF configure option no 20, 67
and edit configure.wrf
DM_FC = mpiifort
DM_CC = mpiicc

any clue or something that I missing in my config?



error message
[HPC-C7:135301:0:135301] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x7ffddbdbf8d0)
==== backtrace (tid: 135301) ====
0 /opt/ohpc/pub/mpi/ucx-ohpc/1.9.0/lib/libucs.so.0(ucs_handle_error+0x254) [0x7f3529d76414]
1 /opt/ohpc/pub/mpi/ucx-ohpc/1.9.0/lib/libucs.so.0(+0x235cc) [0x7f3529d765cc]
2 /opt/ohpc/pub/mpi/ucx-ohpc/1.9.0/lib/libucs.so.0(+0x23878) [0x7f3529d76878]
3 ./wrf.exe() [0x16b4194]
4 ./wrf.exe() [0x14f274c]
5 ./wrf.exe() [0x5b8e23]
6 ./wrf.exe() [0x417921]
7 ./wrf.exe() [0x4178d4]
8 ./wrf.exe() [0x417862]
9 /lib64/libc.so.6(__libc_start_main+0xf3) [0x7f36622167b3]
10 ./wrf.exe() [0x41776e]
=================================
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
wrf.exe 000000000369CE2A for__signal_handl Unknown Unknown
libpthread-2.28.s 00007F36625C8B20 Unknown Unknown Unknown
wrf.exe 00000000016B4194 Unknown Unknown Unknown
wrf.exe 00000000014F274C Unknown Unknown Unknown
wrf.exe 00000000005B8E23 Unknown Unknown Unknown
wrf.exe 0000000000417921 Unknown Unknown Unknown
wrf.exe 00000000004178D4 Unknown Unknown Unknown
wrf.exe 0000000000417862 Unknown Unknown Unknown
libc-2.28.so 00007F36622167B3 __libc_start_main Unknown Unknown
wrf.exe 000000000041776E Unknown Unknown Unknown


slurm script
#!/bin/bash
#####Number of nodes
#SBATCH -N2

#####Number of tasks per node
#SBATCH --ntasks-per-node=40
#SBATCH --job-name=WRF_exe
#SBATCH -o slurm_wrf_run_%j_output.txt
#SBATCH -e slurm_wrf_run_%j_error.txt

ulimit -l unlimited
ulimit -s unlimited
export KMP_STACKSIZE=20480000000
export I_MPI_HYDRA_IFACE="ib0"
export I_MPI_HYDRA_BOOTSTRAP="ssh"
export I_MPI_DEBUG=100

#
export WRFIO_NCD_LARGE_FILE_SUPPORT=1
#export KMP_STACKSIZE=20480000000

#export I_MPI_FABRICS=shm:eek:fi
#export FI_PROVIDER=mlx
#export I_MPI_OFI_LIBRARY_INTERNAL=1
#export I_MPI_PIN_RESPECT_HCA=enable


cd $SLURM_SUBMIT_DIR
source /data/oneapi/setvars.sh

export DIR=$HOME/exp1
export NETCDF=$DIR
export LD_LIBRARY_PATH=$DIR/lib:$LD_LIBRARY_PATH
export PATH=$DIR/bin:$PATH
export WRF_SRC_ROOT_DIR=$HOME/WRFV3
printf '\tNetCDF:\t%s\n' "$DIR"
printf '\tPath:\t%s\n\n' "$PATH"
printf '\tSLURM_NTASKS:\t%s\n\n' "$SLURM_NTASKS"

mpirun -np $SLURM_NTASKS ./wrf.exe
 
Did you compile WRF in dmpar mode?
How many processors each node has?
Can you upload your namelist.input for me to take a look?
 
Top