Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

forrtl: severe (174): SIGSEGV, segmentation fault occurred

subin

New member
Hello, I am a student who is studying WRF.

While running wrf.exe I'm encountering an error "forrtl: severe (174): SIGSEGV, segmentation fault occurred".
The model was executed using ERA5 and OISST data, and there was no problem when only ERA5 data was used under the same conditions. And I have already checked grep cfl rsl* and it shows nothing.

The version of WRF is 4.5.2 and WPS is 4.5.
I attach the namelist.input , namelist.wps, rsl.error.0000 , rsl.out.0000 , rsl.error.0015 , rsl.out.0075.
I really appreciate any help you can provide.

The error in rsl.error.0015 is as follows:
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
wrf.exe 00000000032BE48D for__signal_handl Unknown Unknown
libpthread-2.17.s 00002B8CBA1C46D0 Unknown Unknown Unknown
wrf.exe 0000000002E3B032 Unknown Unknown Unknown
wrf.exe 0000000002E36414 Unknown Unknown Unknown
wrf.exe 0000000002E340A2 Unknown Unknown Unknown
wrf.exe 0000000002709E98 Unknown Unknown Unknown
wrf.exe 00000000026D8F08 Unknown Unknown Unknown
wrf.exe 0000000001F6B907 Unknown Unknown Unknown
wrf.exe 000000000175F4E1 Unknown Unknown Unknown
wrf.exe 0000000001546FE8 Unknown Unknown Unknown
wrf.exe 00000000005913C5 Unknown Unknown Unknown
wrf.exe 00000000005919E2 Unknown Unknown Unknown
wrf.exe 00000000005919E2 Unknown Unknown Unknown
wrf.exe 0000000000410AC1 Unknown Unknown Unknown
wrf.exe 0000000000410A7F Unknown Unknown Unknown
wrf.exe 0000000000410A1E Unknown Unknown Unknown
libc-2.17.so 00002B8CBA3F3445 __libc_start_main Unknown Unknown
wrf.exe 0000000000410929 Unknown
 

Attachments

  • namelist.input
    4.1 KB · Views: 3
  • namelist.wps
    782 bytes · Views: 2
  • rsl.error.0000
    13.8 KB · Views: 2
  • rsl.out.0000
    14 KB · Views: 1
Code:
grep -i FATAL rsl.*


grep -i error rsl.*


grep -i SIGSEGV rsl.*


grep -i cfl rsl.*


These grep commands might help you narrow down which rsl file has the error
 
Thank you for your answer. I've already tried words other than SIGSEGV, but it shows nothing.
I'd like to know what the problem is. 😭
 
Thank you for your answer. I've already tried words other than SIGSEGV, but it shows nothing.
I'd like to know what the problem is. 😭
Few questions

1. What's your data source
2. Can you upload all the rsl.out files into a zip file
3. Can you upload all your rsl.error files into a zip file
4. Did real.exe work and create all the input and boundary files needed?
 
The data used are ERA5 and OISST data, and the rsl file is attached as zip.
Real.exe ran well, and wrfinput and wrflowinp were formed.
Thank you so much for your help.
 

Attachments

  • rsl.zip
    483.8 KB · Views: 2
The data used are ERA5 and OISST data, and the rsl file is attached as zip.
Real.exe ran well, and wrfinput and wrflowinp were formed.
Thank you so much for your help.
@subin

Looking at the errors in the files 15 and 75 the error is this:

Code:
INITIALIZE THREE Noah LSM RELATED TABLES
 Tile Strategy is not specified. Assuming 1D-Y
WRF TILE   1 IS    151 IE    200 JS     51 JE    100
WRF NUMBER OF TILES =   1
 Tile Strategy is not specified. Assuming 1D-Y
WRF TILE   1 IS    252 IE    334 JS    101 JE    200
WRF NUMBER OF TILES =   1
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source             
wrf.exe            00000000032BE48D  for__signal_handl     Unknown  Unknown
libpthread-2.17.s  00002AF525DBD6D0  Unknown               Unknown  Unknown
wrf.exe            0000000002E3B032  Unknown               Unknown  Unknown
wrf.exe            0000000002E36414  Unknown               Unknown  Unknown
wrf.exe            0000000002E340A2  Unknown               Unknown  Unknown
wrf.exe            0000000002709E98  Unknown               Unknown  Unknown
wrf.exe            00000000026D8F08  Unknown               Unknown  Unknown
wrf.exe            0000000001F6B907  Unknown               Unknown  Unknown
wrf.exe            000000000175F4E1  Unknown               Unknown  Unknown
wrf.exe            0000000001546FE8  Unknown               Unknown  Unknown
wrf.exe            00000000005913C5  Unknown               Unknown  Unknown
wrf.exe            00000000005919E2  Unknown               Unknown  Unknown
wrf.exe            00000000005919E2  Unknown               Unknown  Unknown
wrf.exe            0000000000410AC1  Unknown               Unknown  Unknown
wrf.exe            0000000000410A7F  Unknown               Unknown  Unknown
wrf.exe            0000000000410A1E  Unknown               Unknown  Unknown
libc-2.17.so       00002AF525FEC445  __libc_start_main     Unknown  Unknown
wrf.exe            0000000000410929  Unknown               Unknown  Unknown

WRF TILE   1 IS    252 IE    334 JS    601 JE    700
WRF NUMBER OF TILES =   1
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source             
wrf.exe            00000000032BE48D  for__signal_handl     Unknown  Unknown
libpthread-2.17.s  00002B8CBA1C46D0  Unknown               Unknown  Unknown
wrf.exe            0000000002E3B032  Unknown               Unknown  Unknown
wrf.exe            0000000002E36414  Unknown               Unknown  Unknown
wrf.exe            0000000002E340A2  Unknown               Unknown  Unknown
wrf.exe            0000000002709E98  Unknown               Unknown  Unknown
wrf.exe            00000000026D8F08  Unknown               Unknown  Unknown
wrf.exe            0000000001F6B907  Unknown               Unknown  Unknown
wrf.exe            000000000175F4E1  Unknown               Unknown  Unknown
wrf.exe            0000000001546FE8  Unknown               Unknown  Unknown
wrf.exe            00000000005913C5  Unknown               Unknown  Unknown
wrf.exe            00000000005919E2  Unknown               Unknown  Unknown
wrf.exe            00000000005919E2  Unknown               Unknown  Unknown
wrf.exe            0000000000410AC1  Unknown               Unknown  Unknown
wrf.exe            0000000000410A7F  Unknown               Unknown  Unknown
wrf.exe            0000000000410A1E  Unknown               Unknown  Unknown
libc-2.17.so       00002B8CBA3F3445  __libc_start_main     Unknown  Unknown
wrf.exe            0000000000410929  Unknown               Unknown  Unknown

These errors usually occur when the program is trying to access memory that it doesn't have access to or to much data is being accessed at the same time.

What I would do is before executing wrf.exe add this command to the terminal

Bash:
ulimit -s unlimited

This might fix this issue.

Few things about your namelist that might help too:

Change your timestep to ~6*dx. So I would reccomend 45 for the timestep even though 54 is the true 6*dx. I chose 45 so that all the files are written on the hour instead of 14 seconds after. You are using a 1:3 time step ratio so it time steps would land like this. 45, 15, 3.
Code:
 &domains
 time_step                           = 45,

Not sure if you were wanting one way nesting but if you turn this value to 1 then the nests will do two-way nesting
Code:
feedback                            = 0,
feedback =1,


If you want you can turn this one to help fix any cfl errors that might occur, but it is optional for the initial tests
Code:
w_damping                           = 0,
w_damping                           = 1,


Domain looks good to me, i don't see any issues. It's just extremely large but you seem to have plenty of cores to run it. I think the ulimit code will fix all the problems first and adjusting the timestep.

1732985458721.png

Last thing I would say is if you are using ERA5 data as the boundary conditions, make sure that you are either downloading the full file or if you are doing a subset make sure that you download a larger domain then what your wrf domain is so that there isn't any issues. I usually go 10 degrees more then my wrf domain for boundary file subsets.
 
@subin
Another very likely culprit for the segmentation fault is the fact that you're only using 144 processors. Given the size of domain 3 (d03):


e_we = 300, 601, 1000,
e_sn = 300, 601, 1201,

You likely need to try using many more processors than that. The issue is that you probably can't use more than will be problematic for d01, but depending on how each node is set up on your system, you should be able to use several hundred processors and d01 will still be okay. See Choosing an Appropriate Number of Processors.
 
Top