What is the most common reason for a segmentation fault?

Moderator: NCAR/MMM

Locked
kwerner
Posts: 2287
Joined: Wed Feb 14, 2018 9:21 pm

What is the most common reason for a segmentation fault?

Post by kwerner » Thu Aug 30, 2018 12:04 am

Segmentation faults can be difficult to track down. As there isn't usually a clear error message, it can take some trial and error to figure out the problem.

1) A segmentation fault is often the result of using too many or too few processors, or a bad decomposition. Take a look at this FAQ regarding choosing the appropriate number of processors, based on the size of your domain.

2) Sometimes it can be the result of a lack of disk space. Check how much space you have left available for the files to be written. If your domain is large or very high resolution, the output files will be much larger (sometimes a few GB).

3) If the model seg-faults right at the beginning of the run, it can often mean there is something wrong with the input data. Make sure to check your met_em* files to see if you notice anything odd in various variables. Check all variables and all levels.

4) Many times a seg-fault can mean there is a CFL error, which means the model has become unstable, typically due to steep terrain or very strong convection. If this occurs, first try to reduce the timestep. The standard recommendation for time_step is 6xDX (e.g., if your DX = 30000, then you should not set time_step to anything larger than 180). However, if you are still getting CFL errors, you can try to reduce to something more like 4xDX or 3xDX. Sometimes this works, but not always. Another thing you could try is to add smooth_cg_topo = .true. in the &domains section of the namelist, prior to running real if CFL errors happen along boundary zones. This option smoothes the outer rows/columns of the coarse model grid to match the low resolution topography that comes with the driving data. If CFL errors occur near complex terrain, you may try to set epssm = 0.2 (up to 0.5) to see if that makes a difference. This option is used to slightly forward the centering of the vertical pressure gradient (or sound waves) in an effort to damp 3-d divergence. You can also try to set w_damping = 1.

5) A segmentation fault error could be due to a memory issue. Try typing one of the following to see if it helps:
a. setenv MP_STACK_SIZE 64000000 (OMP_STACKSIZE)
b. If you are using csh or tcsh, try this: limit stacksize unlimited
c. If you are using sh or bash, use this command: ulimit -s unlimited
This may not solve your problem, but the default stack size is often quite small and may result in segmentation faults due to insufficient memory.
NCAR/MMM

Locked

Return to “FAQ: WRF Run-time Problems & Options”