I'm running a 3/1km simulation across Southeast Canada and the upper Northeast US over a 96-hour period. These simulations use the BEP/BEM scheme with MYNN PBL. I successfully ran this simulation to completion using the default geo_ems as input to metgrid, followed by real and wrf.exe. When I switched my geo_ems to ones that consider LCZs (urbanized via wudapt to WRF), metgrid and real ran without issues, but wrf.exe is now crashing with a seg-fault on the second timestep.
Snippet of rsl.error.0000 around error occurrence and traceback (debug_level 100):
I use 72 processors for this simulation, so I don't think that is an issue. I also see no CFL errors in the rsl files. In my job (bash) script, I call ulimit -s unlimited, as well as setenv MP_STACK_SIZE 64000000.
I noticed a thread from a few years ago (SegFault in MYNNSFC) had a VERY similar issue to mine, with a few differences, especially in e_vert and eta_levels:
i then tried changing the subroutine in phys/module_sf_mynn.F zolri(), which was a fix implemented by the user with the same aforementioned issue (see: Divide by zero error in phys/module_sf_mynn.F sub zolri() · Issue #1386 · wrf-model/WRF). This was also unsuccessful.
Any help would be greatly appreciated! I've attached a copy of my namelist.input, namelist.wps, wrf.log, and my rsl.error.0000 files.
Snippet of rsl.error.0000 around error occurrence and traceback (debug_level 100):
Code:
d02 2022-05-19_00:00:00 calling inc/HALO_EM_SCALAR_E_5_inline.inc
d02 2022-05-19_00:00:05 module_integrate: back from solve interface
Timing for main: time 2022-05-19_00:00:05 on domain 2: 9.71652 elapsed seconds
d02 2022-05-19_00:00:05 module_integrate: calling solve interface
d02 2022-05-19_00:00:05 grid spacing, dt, time_step_sound= 1000.000 5.000000 4
d02 2022-05-19_00:00:05 calling inc/HALO_EM_MOIST_OLD_E_7_inline.inc
d02 2022-05-19_00:00:05 calling inc/PERIOD_BDY_EM_MOIST_OLD_inline.inc
d02 2022-05-19_00:00:05 calling inc/HALO_EM_A_inline.inc
d02 2022-05-19_00:00:05 calling inc/PERIOD_BDY_EM_A_inline.inc
d02 2022-05-19_00:00:05 calling inc/HALO_EM_PHYS_A_inline.inc
d02 2022-05-19_00:00:05 Top of Radiation Driver
d02 2022-05-19_00:00:05 calling inc/HALO_PWP_inline.inc
d02 2022-05-19_00:00:05 in MYNNSFC
[uagc22-06:95321:0:95321] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xfffffffe07910de0)
==== backtrace (tid: 95321) ====
0 /lib64/libucs.so.0(ucs_handle_error+0x2dc) [0x14706130b13c]
1 /lib64/libucs.so.0(+0x2c31c) [0x14706130b31c]
2 /lib64/libucs.so.0(+0x2c4ea) [0x14706130b4ea]
3 ./wrf.exe() [0x2c37d3a]
4 ./wrf.exe() [0x2c37a44]
5 ./wrf.exe() [0x2c31f56]
6 ./wrf.exe() [0x2c3050d]
7 ./wrf.exe() [0x224a903]
8 ./wrf.exe() [0x1b95bba]
9 ./wrf.exe() [0x150c337]
10 ./wrf.exe() [0x13402bc]
11 ./wrf.exe() [0x5918ff]
12 ./wrf.exe() [0x591f16]
13 ./wrf.exe() [0x414e51]
14 ./wrf.exe() [0x414e0f]
15 ./wrf.exe() [0x414da2]
16 /lib64/libc.so.6(__libc_start_main+0xe5) [0x1472bfe518a5]
17 ./wrf.exe() [0x414cae]
=================================
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
wrf.exe 0000000002DB9F2A for__signal_handl Unknown Unknown
libpthread-2.28.s 00001472C02009F0 Unknown Unknown Unknown
wrf.exe 0000000002C37D3A Unknown Unknown Unknown
wrf.exe 0000000002C37A44 Unknown Unknown Unknown
wrf.exe 0000000002C31F56 Unknown Unknown Unknown
wrf.exe 0000000002C3050D Unknown Unknown Unknown
wrf.exe 000000000224A903 Unknown Unknown Unknown
wrf.exe 0000000001B95BBA Unknown Unknown Unknown
wrf.exe 000000000150C337 Unknown Unknown Unknown
wrf.exe 00000000013402BC Unknown Unknown Unknown
wrf.exe 00000000005918FF Unknown Unknown Unknown
wrf.exe 0000000000591F16 Unknown Unknown Unknown
wrf.exe 0000000000414E51 Unknown Unknown Unknown
wrf.exe 0000000000414E0F Unknown Unknown Unknown
wrf.exe 0000000000414DA2 Unknown Unknown Unknown
libc-2.28.so 00001472BFE518A5 __libc_start_main Unknown Unknown
wrf.exe 0000000000414CAE Unknown Unknown Unknown
I use 72 processors for this simulation, so I don't think that is an issue. I also see no CFL errors in the rsl files. In my job (bash) script, I call ulimit -s unlimited, as well as setenv MP_STACK_SIZE 64000000.
I noticed a thread from a few years ago (SegFault in MYNNSFC) had a VERY similar issue to mine, with a few differences, especially in e_vert and eta_levels:
Code:
e_vert = 51, 51, 51,
eta_levels = 1.,
0.998743415,0.99748677,0.996230185,0.9949736,0.993716955,
0.992334723,0.990814209,0.989141703,0.987301886,0.98527813,
0.983051956,0.980603218,0.977909565,0.974946558,0.971687257,
0.968101978,0.964158237,0.959820092,0.955048144,0.949799001,
0.94402492,0.937673509,0.930686891,0.923001587,0.914547801,
0.905248582,0.895019472,0.883767486,0.871390283,0.857775331,
0.842798889,0.826324821,0.80820334,0.788269699,0.7663427,
0.742223024,0.715691328,0.68650645,0.65440315,0.619089544,
0.580244482,0.537514985,0.49051252,0.438809812,0.381936818,
0.319376528,0.250560224,0.17486228,0.0915945247,0.,
/
i then tried changing the subroutine in phys/module_sf_mynn.F zolri(), which was a fix implemented by the user with the same aforementioned issue (see: Divide by zero error in phys/module_sf_mynn.F sub zolri() · Issue #1386 · wrf-model/WRF). This was also unsuccessful.
Any help would be greatly appreciated! I've attached a copy of my namelist.input, namelist.wps, wrf.log, and my rsl.error.0000 files.