Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

forrtl: severe (174): SIGSEGV, segmentation fault occurred in wrf.exe

Hi all,
While running wrf.exe I'm encountering an error forrtl: severe (174): SIGSEGV, segmentation fault occurred after simulating 2 days.
Error is not coming immediately, it is coming after 2 days of simulation only when hitting the date 2017-11-29_21:52:30 always.
I'm using 64 cores to run the model and in rsl.error.0063 the error shows :
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
wrf.exe 0000000002F7920D for__signal_handl Unknown Unknown
libpthread-2.17.s 00002B0A09D666D0 Unknown Unknown Unknown
wrf.exe 0000000002B54337 Unknown Unknown Unknown
wrf.exe 0000000002B55322 Unknown Unknown Unknown
wrf.exe 0000000002B4EC5B Unknown Unknown Unknown
wrf.exe 0000000002B4D09F Unknown Unknown Unknown
wrf.exe 00000000024057F0 Unknown Unknown Unknown
wrf.exe 0000000001AA1206 Unknown Unknown Unknown
wrf.exe 00000000013EB139 Unknown Unknown Unknown
wrf.exe 0000000001269F50 Unknown Unknown Unknown
wrf.exe 0000000000553453 Unknown Unknown Unknown
wrf.exe 000000000040C981 Unknown Unknown Unknown
wrf.exe 000000000040C93F Unknown Unknown Unknown
wrf.exe 000000000040C8DE Unknown Unknown Unknown
libc-2.17.so 00002B0A09F95445 __libc_start_main Unknown Unknown
wrf.exe 000000000040C7E9 Unknown Unknown Unknown

I'm using ERA5 data and Vtable: Vtable.ERA-interim.pl.
The version of both WRF and WPS is 4.0.
Here I attach the namelist.input , namelist.wps, rsl.error.0000 , rsl.out.0000 , rsl.error.0063 , rsl.out.0063.
I have already checked grep cfl rsl* and it shows nothing.
I tried it with sf_urban_physics = 2 and 0 both are not working.

Please have a look.
I really appreciate any help you can provide.
 

Attachments

  • rslerrors_namelists.zip
    208.2 KB · Views: 4
I ran the model using the command in HPC (Linux cloud0 3.10.0-862.el7.x86_64, intel compiler).

Code:
nohup mpirun -machinefile /home/akash/mvapich2.hosts -np 48 wrf.exe &
Instead of 64 cores I tried 48 cores also, but the error was the same.

In my mvapich2.hosts file in HPCit has shown
cloud0:32
cloud1:32
cloud2:32
Can I use only 32 cores?

And the error message in the nohup.out is

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 421578 RUNNING AT cloud1
= EXIT CODE: 174
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:0@cloud0] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:909): assert (!closed) failed
[proxy:0:0@cloud0] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:0@cloud0] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event
[mpiexec@cloud0] HYDT_bscu_wait_for_completion (tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting
[mpiexec@cloud0] HYDT_bsci_wait_for_completion (tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
[mpiexec@cloud0] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:218): launcher returned error waiting for completion
[mpiexec@cloud0] main (ui/mpich/mpiexec.c:344): process manager error waiting for completion

I'm waiting for the response of experts.
Thanks in advance.
 
Last edited:
Your namelist.input indicates that you run this 10-day simulation with thermal diffusion scheme. However, this scheme is a relatively simple scheme that doesn't consider vegetation effects. Soil moisture is also fixed with a landuse- and season-dependent constant value. As a result, this scheme may leads to unrealistic simulations and model crash.
Can you change the option of LSM and try again?
 
Your namelist.input indicates that you run this 10-day simulation with thermal diffusion scheme. However, this scheme is a relatively simple scheme that doesn't consider vegetation effects. Soil moisture is also fixed with a landuse- and season-dependent constant value. As a result, this scheme may leads to unrealistic simulations and model crash.
Can you change the option of LSM and try again?

Hi @Ming Chen
Thank you for your reply.
I changed scheme sf_surface_physics 2,3,4 and 5.
But with these schemes the real.exe is crashing always with resulting in an error -->.

Using sfcprs to compute psfc
d01 2017-11-27_00:00:00 No average surface temperature for use with inland lakes
Assume non-CLM input
-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE: <stdin> LINE: 3062
grid%tsk unreasonable
-------------------------------------------
[cli_11]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 11

After solving this error only I can move forward and run wrf.exe.
I'm so grateful for each of your suggestions.
Here I attach the needful files.
Thanks in advance
 

Attachments

  • rsl.error.0000.txt
    1.2 KB · Views: 0
  • rsl.error.0011.txt
    1.4 KB · Views: 0
  • rsl.out.0000.txt
    21.7 KB · Views: 0
  • rsl.out.0011.txt
    22 KB · Views: 1
  • namelist.wps
    807 bytes · Views: 0
  • namelist.input
    4.5 KB · Views: 1
The error message when running REAL indicates that your input data is wrong.

Can you tell where you download ERA5 data ? What Vtable did you use to ungrib this data?

Note that ERA5 is a little bit more complicated to use. I wonder whether you can switch to GFS?
 
Hi @Ming Chen
Thanks for your reply.
The error was related to the geogrid.exe. The geographic data was not full. But the geogrid.log didn't show any error.
## Error
1. geo_em files show unreliable values for LANDMASK, SKINTEMP , SOILTEMP, HGT_M and ALBEDO12M variables.
2. It will reflect to met_em files also.
3. So real.exe cant read these inputs from met_em files, if you are using sf_surface_physics =2, 3, 4, or 5 (LSM)
## solution
1. Make sure the path to GEOG files is correct eg: /home/user/WPS_GEOG/ (Sometimes path starting with ~/WPS_GEOG/ won't work )
2. Re-run geogrid.exe, ungrib.exe and metgrid.exe

My error has been solved and here I attach the solution for the public.!
 
Top