Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Segmentation fault on first nest move

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

milancurcic

New member
I'm setting up a vortex following run with WRF 4.2.1. It's built on IBM Power9 using XL compilers.

The model runs, however I hit a segmentation fault as soon as the inner nests makes a move. I tested this by varying the time_to_move variable. The end of the log looks like this:

Code:
...
Timing for main: time 2019-08-30_00:01:55 on domain   2:    0.68377 elapsed seconds
Timing for main: time 2019-08-30_00:02:00 on domain   2:    0.69296 elapsed seconds
ATCF 2019-08-30_00:02:00    22.91   -67.75  995.2   45.5
 2019-08-30_00:02:00 vortex center (in nest x and y):  225.6396637 210.9163208
 2019-08-30_00:02:00 grid   center (in nest x and y):  200.0000000 200.0000000
 2019-08-30_00:02:00 disp          :  6.000000000 6.000000000
 2019-08-30_00:02:00 move (rel cd) :  1 1
  moving  2 1 1

  Signal received: SIGSEGV - Segmentation violation

  Traceback:

And here are the vortex-following related namelist parameters:

Code:
&domains
    time_step = 15
    time_step_fract_num = 0
    time_step_fract_den = 1
    max_dom = 2
    e_we = 664, 400
    e_sn = 700, 400
    e_vert = 40, 40
    p_top_requested = 5000
    num_metgrid_levels = 34
    num_metgrid_soil_levels = 4
    dx = 3000, 1000
    dy = 3000, 1000
    grid_id = 1, 2, 3
    parent_id = 1, 1, 2
    i_parent_start = 1, 471
    j_parent_start = 1, 208
    parent_grid_ratio = 1, 3
    parent_time_step_ratio = 1, 3
    feedback = 1
    smooth_option = 0
    vortex_interval  = 5, 5
    max_vortex_speed = 15, 15
    corral_dist = 8, 8
    track_level = 70000
    time_to_move = 0, 2
    use_surface = .false.
/

If it helps, I attach the full namelist.input and rsl.error.0000 files.

How can I best diagnose this?

Thank you!
Milan
 

Attachments

  • namelist.input
    2.4 KB · Views: 64
  • rsl.error.0000.txt
    6.7 KB · Views: 53
Hi Milan,
The problem could potentially be that you're not using enough processors to run this. You're only using 40 processors, and for a domain of 664 x 700, you will likely need more than that. Even though the model fails with the first nest move, it could be related to the size of d01. Take a look at this FAQ that helps you determine a good number of processors to use: https://forum.mmm.ucar.edu/phpBB3/viewtopic.php?f=73&t=5082
If that's not the problem, please send your new rsl.* files (it's best if you can package all of them in a *.tar file and send that). Thanks!
 
Thank you, Kelly! I re-ran on 320 processors with seemingly same results. I attach all rsl files as rsl.tar. Thank you!
 

Attachments

  • rsl.tar
    2.4 MB · Views: 48
Hi,
I attempted to run your case, using your namelist and GFS-FNL data for the dates you are using. I ran with V4.2.1, compiled with vortex-following, and I didn't get the problem you had. The only difference was that I set the history_interval to hourly, instead of every minute, as that was taking entirely too long to integrate. Is there a reason why you need the model interval to be 1 minute? If you try setting that to 60, instead, does that make any difference? Probably not, but just wanted to check.
 
Thanks a lot, that confirms that the segfault is not due to the namelist mis-config. I only used history_interval = 1 to diagnose this issue.

At this point I can only suspect that this may have something to do with my Registry issue reported in https://forum.mmm.ucar.edu/phpBB3/viewtopic.php?f=37&t=9444.

I will keep digging there and report back what I find.
 
Top