Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Segmentation fault while running wrf.exe using sf_urban_physics = 1

mukeshkhadav00

New member
I am running the WRF model for an event using the urban physics scheme=1 and getting the below error.
WRF model version: 4.5.2
Simulation period: 22/05/2024_00 to 27/05/2024_18

Timing for Writing wrfout_d02_2024-05-22_00:00:00 for domain 2: 0.52564 elapsed seconds
Tile Strategy is not specified. Assuming 1D-Y
WRF TILE 1 IS 1 IE 208 JS 1 JE 205
WRF NUMBER OF TILES = 1
Timing for Writing wrfout_d03_2024-05-22_00:00:00 for domain 3: 1.76686 elapsed seconds
Tile Strategy is not specified. Assuming 1D-Y
WRF TILE 1 IS 1 IE 391 JS 1 JE 445
WRF NUMBER OF TILES = 1
Timing for main: time 2024-05-22_00:00:00 on domain 3: 24.30965 elapsed seconds

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0 0x1507077cfd11 in ???
#1 0x1507077ceee5 in ???
#2 0x15070746808f in ???
at /build/glibc-B3wQXB/glibc-2.31/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0
#3 0x559e89fd505f in ???
#4 0x559e89fda624 in ???
#5 0x559e89fde6a2 in ???
#6 0x559e897bd065 in ???
#7 0x559e88fe55c5 in ???
#8 0x559e88a2c538 in ???
#9 0x559e888c1953 in ???
#10 0x559e8790368e in ???
#11 0x559e87903cfb in ???
#12 0x559e87903cfb in ???
#13 0x559e8788a5a7 in ???
#14 0x559e87889fde in ???
#15 0x150707449082 in __libc_start_main
at ../csu/libc-start.c:308
#16 0x559e8788a01d in ???
#17 0xffffffffffffffff in ???

I am also attaching here the namelist.input, rsl.error and rsl.out files. Please give some solution if someone got this type of error.
 

Attachments

  • rsl.out.0000
    2.2 MB · Views: 2
  • rsl.error.0000
    324 KB · Views: 1
  • namelist.input
    3.9 KB · Views: 7
I looked at your namelist.input and have a few concerns:

(1) time_step =2 is too small for delx = 9km. Please change it to 45.
(2) please turn off cumulus scheme for D02 and D03, i.e., cu_physics = 1, 0, 0
(3) set radt = 9, 9, 9
(4) set sf_urban_physics = 1, 1, 1, this is because we require physics to be the same for all domains

In addition, this is a big case with large grid numbers in D03. Please run with more processors to avoid memory issues.
 
Thank you so much for replying.
I have done the correction which you mentioned, but still I am getting the same error.

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0 0x14634a37e692 in ???
#1 0x14634a37d825 in ???
#2 0x1463493b1acf in ???
#3 0x3009c15 in ???
#4 0x300dc09 in ???
#5 0x3012196 in ???
#6 0x272114c in ???
#7 0x1bda869 in ???
#8 0x1408648 in ???
#9 0x1218bcb in ???
#10 0x47b339 in ???
#11 0x47b999 in ???
#12 0x47b999 in ???
#13 0x406a71 in ???
#14 0x40605c in ???
#15 0x14634939dca2 in ???
#16 0x40609d in ???
#17 0xffffffffffffffff in ???

I am running this simulation in the institute HPC; I am attaching here the Slurm file.

Slurm file for submitting the job in HPC:

#!/bin/bash
#SBATCH --job-name=wrf_run # Job name
#SBATCH --partition=dgx # Partition to use
#SBATCH --ntasks=4 # Number of tasks
#SBATCH --cpus-per-task=1 # Number of CPU cores per task
#SBATCH --gres=gpu:1 # Include gpu for the task (only for gpu jobs)
#SBATCH --mem=16gb # Total memory limit
#SBATCH --time=48:00:00 # Time limit hrs:min:sec (optional)
#SBATCH --output=wrf_real_%j.log # Standard output and error log

echo "Job started on: $(date)"
echo "Running on node: $(hostname)"
echo "Working directory: $(pwd)"

# === Change to WRF run directory ===
cd /home/rs/sar/wrf/WRFV4.5/test/em_real/

# === Run WRF with mpirun ===
./wrf.exe

echo "Job ended on: $(date)"
 

Attachments

  • rsl.out.0000
    2.2 MB · Views: 0
  • rsl.error.0000
    324 KB · Views: 0
  • namelist.input
    3.9 KB · Views: 0
Last edited:
Thank you so much for replying.
I have done the correction which you mentioned, but still I am getting the same error.

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0 0x14634a37e692 in ???
#1 0x14634a37d825 in ???
#2 0x1463493b1acf in ???
#3 0x3009c15 in ???
#4 0x300dc09 in ???
#5 0x3012196 in ???
#6 0x272114c in ???
#7 0x1bda869 in ???
#8 0x1408648 in ???
#9 0x1218bcb in ???
#10 0x47b339 in ???
#11 0x47b999 in ???
#12 0x47b999 in ???
#13 0x406a71 in ???
#14 0x40605c in ???
#15 0x14634939dca2 in ???
#16 0x40609d in ???
#17 0xffffffffffffffff in ???

I am running this simulation in the institute HPC; I am attaching here the Slurm file.

Slurm file for submitting the job in HPC:

#!/bin/bash
#SBATCH --job-name=wrf_run # Job name
#SBATCH --partition=dgx # Partition to use
#SBATCH --ntasks=4 # Number of tasks
#SBATCH --cpus-per-task=1 # Number of CPU cores per task
#SBATCH --gres=gpu:1 # Include gpu for the task (only for gpu jobs)
#SBATCH --mem=16gb # Total memory limit
#SBATCH --time=48:00:00 # Time limit hrs:min:sec (optional)
#SBATCH --output=wrf_real_%j.log # Standard output and error log

echo "Job started on: $(date)"
echo "Running on node: $(hostname)"
echo "Working directory: $(pwd)"

# === Change to WRF run directory ===
cd /home/rs/sar/wrf/WRFV4.5/test/em_real/

# === Run WRF with mpirun ===
./wrf.exe

echo "Job ended on: $(date)"
HI,HAVE YOU SOLVED THIS PROBLEM?
 
Thank you so much for replying.
I have done the correction which you mentioned, but still I am getting the same error.

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0 0x14634a37e692 in ???
#1 0x14634a37d825 in ???
#2 0x1463493b1acf in ???
#3 0x3009c15 in ???
#4 0x300dc09 in ???
#5 0x3012196 in ???
#6 0x272114c in ???
#7 0x1bda869 in ???
#8 0x1408648 in ???
#9 0x1218bcb in ???
#10 0x47b339 in ???
#11 0x47b999 in ???
#12 0x47b999 in ???
#13 0x406a71 in ???
#14 0x40605c in ???
#15 0x14634939dca2 in ???
#16 0x40609d in ???
#17 0xffffffffffffffff in ???

I am running this simulation in the institute HPC; I am attaching here the Slurm file.

Slurm file for submitting the job in HPC:

#!/bin/bash
#SBATCH --job-name=wrf_run # Job name
#SBATCH --partition=dgx # Partition to use
#SBATCH --ntasks=4 # Number of tasks
#SBATCH --cpus-per-task=1 # Number of CPU cores per task
#SBATCH --gres=gpu:1 # Include gpu for the task (only for gpu jobs)
#SBATCH --mem=16gb # Total memory limit
#SBATCH --time=48:00:00 # Time limit hrs:min:sec (optional)
#SBATCH --output=wrf_real_%j.log # Standard output and error log

echo "Job started on: $(date)"
echo "Running on node: $(hostname)"
echo "Working directory: $(pwd)"

# === Change to WRF run directory ===
cd /home/rs/sar/wrf/WRFV4.5/test/em_real/

# === Run WRF with mpirun ===
./wrf.exe

echo "Job ended on: $(date)"
try calling this command before running WPS and WRF

Bash:
ulimit -s unlimited

sometimes this error occurs due to the memory being a issue
 
try calling this command before running WPS and WRF

Bash:
ulimit -s unlimited

sometimes this error occurs due to the memory being a issue
Thank you for replying.

I have run this command (ulimit -s unlimited) before running WPS and WRF. Also, I have implemented what Ming said and made corrections in the namelist.input file. This time I have run only for the outer domain (9 km), but still I am getting the same error.

WRF TILE 1 IS 1 IE 100 JS 1 JE 99
WRF NUMBER OF TILES = 1
d01 2024-05-22_00:00:00 ----------------------------------------
d01 2024-05-22_00:00:00 W-DAMPING BEGINS AT W-COURANT NUMBER = 1.00000000
d01 2024-05-22_00:00:00 ----------------------------------------
Timing for main: time 2024-05-22_00:00:45 on domain 1: 1.57438 elapsed seconds

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0 0x14cb85dd2d11 in ???
#1 0x14cb85dd1ee5 in ???
#2 0x14cb85a6b08f in ???
at /build/glibc-B3wQXB/glibc-2.31/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0
#3 0x55f9e19fb05f in ???
#4 0x55f9e1a00624 in ???
#5 0x55f9e1a046a2 in ???
#6 0x55f9e11e3065 in ???
#7 0x55f9e0a0b5c5 in ???
#8 0x55f9e0452538 in ???
#9 0x55f9e02e7953 in ???
#10 0x55f9df32968e in ???
#11 0x55f9df2b05a7 in ???
#12 0x55f9df2affde in ???
#13 0x14cb85a4c082 in __libc_start_main
at ../csu/libc-start.c:308
#14 0x55f9df2b001d in ???
#15 0xffffffffffffffff in ???


I am attaching here the rsl.out.0000, rsl.error.0000 and namelist.input files.
 

Attachments

  • namelist.input
    3.9 KB · Views: 0
  • rsl.error.0000
    7.6 KB · Views: 1
  • rsl.out.0000
    27.8 KB · Views: 0
Thank you for replying.

I have run this command (ulimit -s unlimited) before running WPS and WRF. Also, I have implemented what Ming said and made corrections in the namelist.input file. This time I have run only for the outer domain (9 km), but still I am getting the same error.

WRF TILE 1 IS 1 IE 100 JS 1 JE 99
WRF NUMBER OF TILES = 1
d01 2024-05-22_00:00:00 ----------------------------------------
d01 2024-05-22_00:00:00 W-DAMPING BEGINS AT W-COURANT NUMBER = 1.00000000
d01 2024-05-22_00:00:00 ----------------------------------------
Timing for main: time 2024-05-22_00:00:45 on domain 1: 1.57438 elapsed seconds

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0 0x14cb85dd2d11 in ???
#1 0x14cb85dd1ee5 in ???
#2 0x14cb85a6b08f in ???
at /build/glibc-B3wQXB/glibc-2.31/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0
#3 0x55f9e19fb05f in ???
#4 0x55f9e1a00624 in ???
#5 0x55f9e1a046a2 in ???
#6 0x55f9e11e3065 in ???
#7 0x55f9e0a0b5c5 in ???
#8 0x55f9e0452538 in ???
#9 0x55f9e02e7953 in ???
#10 0x55f9df32968e in ???
#11 0x55f9df2b05a7 in ???
#12 0x55f9df2affde in ???
#13 0x14cb85a4c082 in __libc_start_main
at ../csu/libc-start.c:308
#14 0x55f9df2b001d in ???
#15 0xffffffffffffffff in ???


I am attaching here the rsl.out.0000, rsl.error.0000 and namelist.input files.
Looks like something is going on with Real.exe

First thing to do is run ./real.exe with how ever many cores you use and zip all the rsl.error and rsl.out files together

WRF overwrites rsl.error and rsl.out files when it runs so this will help diagnose real.exe issue

next run ./wrf.exe with how ever many cores you use and zip all the rsl.error and rsl.out files together.


This will let me see what's going on.
 
Looks like something is going on with Real.exe

First thing to do is run ./real.exe with how ever many cores you use and zip all the rsl.error and rsl.out files together

WRF overwrites rsl.error and rsl.out files when it runs so this will help diagnose real.exe issue

next run ./wrf.exe with how ever many cores you use and zip all the rsl.error and rsl.out files together.


This will let me see what's going on.
Thank you for replying.

I have run ./real.exe with 16 cores and zip all the rsl.error and rsl.out files together.

Then, I have run ./wrf.exe with 16 cores and zip all the rsl.error and rsl.out files together.

I have attached the files here.

I am using NCEP GDAS Final Analysis datasets.
 

Attachments

  • real_rsl_logs.zip
    127.9 KB · Views: 1
  • wrf_rsl_logs.zip
    44.3 KB · Views: 1
  • namelist.input
    3.9 KB · Views: 2
Last edited:
Thank you for replying.

I have run ./real.exe with 16 cores and zip all the rsl.error and rsl.out files together.

Then, I have run ./wrf.exe with 16 cores and zip all the rsl.error and rsl.out files together.

I have attached the files here.

I am using NCEP GDAS Final Analysis datasets.
try making these changes

D01 needs to be 100x100 gridpoints
Cu_physics: 11, 0, 0,

try that and see if it runs, i think one of those are the source of the problem
 
try making these changes

D01 needs to be 100x100 gridpoints
Cu_physics: 11, 0, 0,

try that and see if it runs, i think one of those are the source of the problem
Thank you for replying.

I have done these corrections but am still getting the same error.

$ mpirun -np 16 ./wrf.exe
Invalid MIT-MAGIC-COOKIE-1 key starting wrf task 0 of 16
starting wrf task 1 of 16
starting wrf task 2 of 16
starting wrf task 3 of 16
starting wrf task 4 of 16
starting wrf task 5 of 16
starting wrf task 6 of 16
starting wrf task 7 of 16
starting wrf task 8 of 16
starting wrf task 9 of 16
starting wrf task 10 of 16
starting wrf task 11 of 16
starting wrf task 12 of 16
starting wrf task 13 of 16
starting wrf task 14 of 16
starting wrf task 15 of 16

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 579990 RUNNING AT user
= EXIT CODE: 9
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

if that doesn't work you'll need to wait for @kwerner or @Ming Chen to respond, but at least they will have your rsl files handy
I am attaching here the zip files of real.exe and wrf.exe which contains all the rsl.error and rsl.out files.

Also, I am attaching here the namelist.wps and namelist.input files.
 

Attachments

  • real_rsl_logs.zip
    108.4 KB · Views: 1
  • wrf_rsl_logs.zip
    44.2 KB · Views: 1
  • namelist.input
    3.9 KB · Views: 1
  • namelist.wps
    1.5 KB · Views: 2
try some of these suggestions, i am at a loss sadly, my guess is it is a physics issue or dynamics but i don't know enough about it to fix it.
 
Have you tried increasing your memory limit in your bash file? 16GB seems pretty small

#!/bin/bash
#SBATCH --job-name=wrf_run # Job name
#SBATCH --partition=dgx # Partition to use
#SBATCH --ntasks=4 # Number of tasks
#SBATCH --cpus-per-task=1 # Number of CPU cores per task
#SBATCH --gres=gpu:1 # Include gpu for the task (only for gpu jobs)
#SBATCH --mem=16gb # Total memory limit
#SBATCH --time=48:00:00 # Time limit hrs:min:sec (optional)
#SBATCH --output=wrf_real_%j.log # Standard output and error log
 
If the model crashed immediately after wrf.exe started, it could be because (1) your input data is wromng , or (2) you don't have sufficient memory to run this case. To make sure what is the issue, let's try the following options:
(1) run over a single domain (i.e., max_dom=1), ---- if it works, we can know for sure that the input data and the model both are fine
(2) If the single-domain case fails, we need to look at the input data
(3) Please recompile WRF in debug mode, then rerun this case. The log file will tell exactly when and where the model crashes first. This will give you more hints to debug possible issues.
 
If the model crashed immediately after wrf.exe started, it could be because (1) your input data is wromng , or (2) you don't have sufficient memory to run this case. To make sure what is the issue, let's try the following options:
(1) run over a single domain (i.e., max_dom=1), ---- if it works, we can know for sure that the input data and the model both are fine
(2) If the single-domain case fails, we need to look at the input data
(3) Please recompile WRF in debug mode, then rerun this case. The log file will tell exactly when and where the model crashes first. This will give you more hints to debug possible issues.
Thank you for replying.
Actually, when I am running the WRF model without the urban_physics scheme and for other simulation periods, then the WRF is running successfully. I am getting an error only when I am running with urban_physics schemes and this simulation period (22/05/2024_00 to 27/05/2024_18). So maybe it's because of input data error.
For other simulation periods also, I am using the same NCEP FNL data but not getting an error (without the urban_physics scheme).
 
Top