Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

(RESOLVED) Segmentation fault (11) exit

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

Hi all,

when i ran the WRF model on 3 domains, respectively 15, 5 and 1km “wrf.exe” exits with error:

[hpc-16-01:18706] *** Process received signal ***
[hpc-16-01:18706] Signal: Segmentation fault (11)
[hpc-16-01:18706] Signal code: Address not mapped (1)
[hpc-16-01:18706] Failing at address: 0xfffffffe03a5bac0
[hpc-16-01:18706] [ 0] /lib64/libpthread.so.0(+0xf630)[0x7f8ffbf08630]
[hpc-16-01:18706] [ 1] ./wrf.exe[0x155f1c8]
[hpc-16-01:18706] [ 2] ./wrf.exe[0x1568da0]
[hpc-16-01:18706] [ 3] ./wrf.exe[0x156a46a]
[hpc-16-01:18706] [ 4] ./wrf.exe[0x156d1ed]
[hpc-16-01:18706] [ 5] ./wrf.exe[0x1116827]
[hpc-16-01:18706] [ 6] ./wrf.exe[0x1216267]
[hpc-16-01:18706] [ 7] ./wrf.exe[0xcb7375]
[hpc-16-01:18706] [ 8] ./wrf.exe[0xbc17a6]
[hpc-16-01:18706] [ 9] ./wrf.exe[0x463d23]
[hpc-16-01:18706] [10] ./wrf.exe[0x46416b]
[hpc-16-01:18706] [11] ./wrf.exe[0x46416b]
[hpc-16-01:18706] [12] ./wrf.exe[0x4056e4]
[hpc-16-01:18706] [13] ./wrf.exe[0x404f3c]
[hpc-16-01:18706] [14] ./wrf.exe[0x244818a]
[hpc-16-01:18706] [15] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f8ffbb4d555]
[hpc-16-01:18706] [16] ./wrf.exe[0x404e39]
[hpc-16-01:18706] *** End of error message ***

and the wrfoutput files have only first hour.

When I ran the same setup on only 2 domine I haven’t any problem.
In both cases I used 40 CPU.

Looking in the forum, errors of this type occur when the “time_step” parameter does not respect the rule “6 x dX”. In my settings it is respected!

I attach the namelist.input and one of the 40 rsl.error files (for the second I changed the name).

I thank in advance anyone who wants to take an interest in my problem.

Andrea
 

Attachments

  • namelist.input
    5.9 KB · Views: 35
  • rsl-error-0039.txt
    6.7 KB · Views: 41
Hi Andrea,
The issue regarding the time_step is specific to when the rsl* files show "cfl" errors. Have you checked all of your files to see if there are any cfl errors present? You can do so with this command:
Code:
grep cfl rsl*

If nothing prints out, that likely isn't causing the problem. The problem could also be related to the number of processors you're using. The size of your d03 is significantly larger than d01/d02, so it could be that you need more processors for that domain, but you have to be careful to not have too many for your other domains. For more information on choosing an appropriate number of processors, see this FAQ. And for general information about segmentation faults with WRF, see this one.
 
My run problem is related to "elf" error but no suggested solution in
https://forum.mmm.ucar.edu/phpBB3/viewtopic.php?f=73&t=133

has resolved the problem.
 
Andrea,
I apologize for the delay in response while I was traveling. Did you try to use more processors to run this?
 
Hi Kwerner,

thanks for your reply.

For the moment it is not possible for me to use more than 40 CPUs as already done.

The problem was bypassed by changing the resolutions of the 3 domains. Instead of 15-5-1 I used 15-3-1 and I had no problems.

Thank you,
Andrea
 
Andrea,
I am glad to hear that you found a work-around for your problem. Thank you for updating the post!
 
Top