Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

WRF.exe crashes during just after first output 'Caught signal 11 HALO_CUP_G3_IN_inline.inc'

Afernandez

New member
Hi,

I'm trying to run a 54-hour simulation using historical MPI-ESM1-2-HR with 3 nested domains. The output should be every hour. The model crashes when running the outermost domain (d01) a few seconds after creating the first wrfoutput. A few days ago I attempted the same with another set of data from the same model and it worked thanks to suggestions from @William.Hatheway (this post). In this new simulation, I keep getting "Caught signal 11 (segmentation fault...)" when the model is running "HALO_CUP_G3_IN_inline.inc mskf_cps_mp". I created set of wrfinput from MPI-ESM1-2-HR model data with nearly the same characteristics of the previous test case I ran. The biggest change is that I'm using 14 instead of 15 metgrid levels due to restrictions in my wind data (I had to interpolate between geopotential levels to match the levels of the other 3d variables). "Real" runs fine but wrf.exe stops during the first time step. I checked my met_em, wrfbdy, and wrfinput, and all look OK, but 19 of my rsl files all show the following in the last lines, after "mskf_cps_mp".

d01 2005-01-01_06:00:00 call cumulus_driver
d01 2005-01-01_06:00:00 calling inc/HALO_CUP_G3_IN_inline.inc
d01 2005-01-01_06:00:00 in mskf_cps_mp
[f0524:155901:0:155901] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xfffffffe06a54480)
==== backtrace (tid: 155901) ====
0 /lib64/libucs.so.0(ucs_handle_error+0x2dc) [0x14f905665e4c]
1 /lib64/libucs.so.0(+0x2c02c) [0x14f90566602c]
2 /lib64/libucs.so.0(+0x2c1fa) [0x14f9056661fa]
3 /home/titan/gwgk/gwgk101h/WRF_MODEL/WRF/test/em_real/wrf.exe() [0x2f74384]
4 /home/titan/gwgk/gwgk101h/WRF_MODEL/WRF/test/em_real/wrf.exe() [0x2f4d98a]
5 /home/titan/gwgk/gwgk101h/WRF_MODEL/WRF/test/em_real/wrf.exe() [0x286471f]
6 /home/titan/gwgk/gwgk101h/WRF_MODEL/WRF/test/em_real/wrf.exe() [0x1fa5d10]
7 /home/titan/gwgk/gwgk101h/WRF_MODEL/WRF/test/em_real/wrf.exe() [0x1736c5b]
8 /home/titan/gwgk/gwgk101h/WRF_MODEL/WRF/test/em_real/wrf.exe() [0x1512b28]
9 /home/titan/gwgk/gwgk101h/WRF_MODEL/WRF/test/em_real/wrf.exe() [0x5b6b61]
10 /home/titan/gwgk/gwgk101h/WRF_MODEL/WRF/test/em_real/wrf.exe() [0x4145a1]
11 /home/titan/gwgk/gwgk101h/WRF_MODEL/WRF/test/em_real/wrf.exe() [0x414554]
12 /home/titan/gwgk/gwgk101h/WRF_MODEL/WRF/test/em_real/wrf.exe() [0x4144e2]
13 /lib64/libc.so.6(__libc_start_main+0xe5) [0x14fa934a2d85]
14 /home/titan/gwgk/gwgk101h/WRF_MODEL/WRF/test/em_real/wrf.exe() [0x4143ee]
=================================
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
libpnetcdf.so.4.0 000014FA97505359 for__signal_handl Unknown Unknown
libpthread-2.28.s 000014FA93A43CF0 Unknown Unknown Unknown
wrf.exe 0000000002F74384 Unknown Unknown Unknown
wrf.exe 0000000002F4D98A Unknown Unknown Unknown
wrf.exe 000000000286471F Unknown Unknown Unknown
wrf.exe 0000000001FA5D10 Unknown Unknown Unknown
wrf.exe 0000000001736C5B Unknown Unknown Unknown
wrf.exe 0000000001512B28 Unknown Unknown Unknown
wrf.exe 00000000005B6B61 Unknown Unknown Unknown
wrf.exe 00000000004145A1 Unknown Unknown Unknown
wrf.exe 0000000000414554 Unknown Unknown Unknown
wrf.exe 00000000004144E2 Unknown Unknown Unknown
libc-2.28.so 000014FA934A2D85 __libc_start_main Unknown Unknown
wrf.exe 00000000004143EE Unknown Unknown Unknown

I attach the rsl.error.0000 from real.exe, the rsl.error.0358 from wrf.exe, that is one of the files showing this issue, and my namelist.input.

Any help is truly appreciated
 

Attachments

  • namelist.input
    7 KB · Views: 1
  • rsl.error.0000
    3.1 MB · Views: 1
  • rsl.error.0358.txt
    578.4 KB · Views: 1
Hi,

Thank you very much for the reply. I have reduced the number of processors several times and still crashes. I also tried modifying my input data but nothing works. Is there another possibility?
 
Can you send the latest rsl files for a run after you've reduced the number of processors? Please package all rsl* files into a single *.tar file and attach that. Please let me know the different numbers of processors you've tried, as well. Thanks!
 
Thank you. Here I attach the files. The directory names indicate the number of cores I have tested. I have tried with the maximum (~d03) and minimum (~d01) decomposition according to what I have read in other posts (Choosing an Appropriate Number of Processors). Here I send the rsl files in debug 0 in case they are useful. I also have a version with debug 1000 but they seem too big to attach.
 

Attachments

  • rsl.tar.gz
    1 MB · Views: 1
Hi,

This is just to let you know that I fixed the issue of crashes in wrf.exe. It happened to be a bad interpolation I had in one of the scripts I used to preprocess 3D winds in my input data. Now the short test case runs almost without issues. I found some cfl errors that I’m trying to fix by playing a bit with the timestep, but at least now it runs.

Thanks
 
Top