Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

DFI finalizes with exit code 11 in WRF 3.9

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

yandyof

New member
Hello,
Problem running WRF 3.9 with dfi. Some details below:
No physical_suit was set for my case, just set physical options in the old fashion way.
Program exits with exit code 11 after trying to write: Writing out initialized model state for domain 3 of a three two-way nesting set of domains. Running in a cluster with mpi and 40 processors.
Here is the backtrace info:

....
Writing out initialized model state
Writing out initialized model state
Writing out initialized model state

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0 0x7F39BE6D16F7
#1 0x7F39BE6D1D3E
#2 0x7F39BDBD824F
#3 0x11124A1 in output_wrf_
#4 0x108880D in __module_io_domain_MOD_open_w_dataset
#5 0x114C881 in dfi_write_initialized_state_
#6 0x114C9CF in dfi_write_initialized_state_recurse_
#7 0x114C9F8 in dfi_write_initialized_state_recurse_
#8 0x114C9F8 in dfi_write_initialized_state_recurse_

Also, after turning of dfi_write_filtered_input and running again I got the same when trying to write wrfout again for domain 3 for analysis time. wrfout for domains d01 and d02 are written without problems. Here the backtrace info:

...
Timing for Writing wrfout_d01_2017-02-26_00:00:00 for domain 1: 2.80144 elapsed seconds
Timing for processing lateral boundary for domain 1: 0.39290 elapsed seconds
Timing for Writing wrfout_d02_2017-02-26_00:00:00 for domain 2: 2.81932 elapsed seconds

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0 0x7F7555FA06F7
#1 0x7F7555FA0D3E
#2 0x7F75554A724F
#3 0x11124A1 in output_wrf_
#4 0x108880D in __module_io_domain_MOD_open_w_dataset
#5 0x11792D9 in open_hist_w_
#6 0x117B0BC in med_hist_out_.part.1 at mediation_integrate.f90:?
#7 0x117BA0F in med_before_solve_io_
#8 0x4721E1 in __module_integrate_MOD_integrate
#9 0x47282B in __module_integrate_MOD_integrate
#10 0x47282B in __module_integrate_MOD_integrate
#11 0x407A13 in __module_wrf_top_MOD_wrf_run

Below is main namelist.input info:

&time_control
input_from_file = .true.,.true.,.true.,
fine_input_stream = 0, 2, 2,

&domains
time_step = 15,
time_step_dfi = 15,
time_step_fract_num = 0,
time_step_fract_den = 1,
max_dom = 3,
i_parent_start = 1, 80, 80
j_parent_start = 1, 80, 80
e_we = 240, 241, 241
e_sn = 240, 241, 241
vert_refine_method = 0,0,0
e_vert = 38, 38, 38
parent_time_step_ratio = 1, 3, 4,
feedback = 1,
smooth_option = 0

&physics
mp_physics = 8, 8, 8
cu_physics = 0, 0, 0
ra_lw_physics = 4, 4, 4,
ra_sw_physics = 4, 4, 4,
sf_sfclay_physics = 1, 1, 1,
sf_surface_physics = 1, 1, 1,
bl_pbl_physics = 1, 1, 1,
sf_urban_physics = 1, 1, 1,

&dfi_control
dfi_opt = 3
dfi_nfilter = 7
dfi_write_filtered_input = .false.
dfi_write_dfi_history = .false.

NOTE: Interesting to add though. This configuration runs fine when DFI is TURNED OFF *************


Any ideas...??????? Could that be a ram memory issue????
thanks a lot in advance
 
Hi...

A seg fault means an array out of bounds.

It *could* be a memory issue if an array allocation failed and the code didn't check for
that. Sometimes people forget to check them all when writing code.

If you are running MPI, try adding another node.
 
When running DFI over multiple domains with concurrent nesting, feedback between domains must be disabled.
Can you set feedback = 0, then try again?
 
Hi, thanks for your replays

It was odd that this configuration ran well with two domains (even with feedback on); the problem seemed to be related with the writing of data for domain 3 only. So, I imagined it was a memory issue but it was not. I increased computing nodes, added namelist_quilt info for better I/O management and set feedback to 0; none of those solved my problem. Finally, while looking for running efficiency I modified timestep from 15 to 18 seconds and changed parent_time_step_ratio of domain 3 from 4 to 3 and the program ran fine. I am guessing the model did not like this even time step ratio, or working with a time step of 1.25 seconds (more than one significant digit). Now the program is running even with feedback on. Moreover, I noted that in share/dfi.F the model temporarily sets feedback to 0 for dfi integrations, regardless how it was set in the namelist.

This case is closed for me. However, I do think this problem should be analyzed in WRF code, solved, or explicitly mentioned somewhere in the WRF user's guide.
 
Please post the complete nameless.input for me to take a look (the one worked and the one failed). Thank you.
 
Top