Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Can't consistently run WRF4.4 with urban model on

Rtsquared

New member
I've run into a really strange error today. The attached namelist.input file works when run from 0Z on May 1st through 0Z on May 8th 2018. However, when I try to run the next week (re-initializing the run instead of using restart files as I had previously since that continuously built up a larger cold bias over time) it simply fails. There is no error message that can found, no fatal call. It just quits 3 minutes into the run when it begins to iterate. I have tried using different dates for the 2nd week (the attached file is my attempt at making 5 day runs instead of 7) but it does nothing. There is an odd bug when using WRF Urban where the restart_interval and history_interval are linked and you need very frequent restart files if you want history output to have decent temporal resolution (on May 1st a 3 hour restart_interval causes WRF to fail running, but a 2 hour restart does not when the history interval is 60 minutes), but changing those values does nothing here (I've tried reducing restart_interval to 30 minutes and increasing history_interval to 120 minutes at the same time, no change). Changing the time_step or radt also appears to have no impact (at least not when trying to run the 2nd week), so right now I'm stumped (turning off the urban model does let it work which is how I know the problem is related to it, but I need it on). It's super odd that this configuration works, and then it doesn't.

I'm running this set of simulations using NAM12K analysis data and NLCD 2011 land use data. Other additions I make have no impact on whether WRF will run or not- attempting to run wrf.exe immediately after the completion of real.exe still fails.

Feb 1 edit: The attached namelist doesn't actually work when run on the first week of May, but changing the timestep from 60 to 48 allows it to run with restart_interval of 120 or 180 (but it still fails when run at other dates, I even got a "Flerchinger USEd in NEW version. Iterations = 10" output when trying to run the 3rd week of May)
 

Attachments

  • namelist_Jan31.input
    7.8 KB · Views: 7
Last edited:
Did you check all the rsl files for the failed case? if the model crashed immediately after wraf.exe starts, it often indicates something wrong in the input data. Please double check you data and all there sl files. there should have some error message somewhere in at least one of the rsl files.
 
I had been searching for "fatal" and "error" and finding nothing in the rsl files. Turns out it's a segmentation fault.

[r1n23:22363:0:22363] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xfffffffe06bf3420)
==== backtrace (tid: 22363) ====
0 0x000000000004d445 ucs_debug_print_backtrace() ???:0
1 0x000000000339e1e5 module_sf_sfclayrev_mp_psim_stable_() ???:0
2 0x00000000033994e9 module_sf_sfclayrev_mp_sfclayrev1d_() ???:0
3 0x0000000003397fda module_sf_sfclayrev_mp_sfclayrev_() ???:0
4 0x0000000002a9395d module_surface_driver_mp_surface_driver_() ???:0
5 0x0000000002196527 module_first_rk_step_part1_mp_first_rk_step_part1_() ???:0
6 0x0000000001705fdb solve_em_() ???:0
7 0x00000000014e5178 solve_interface_() ???:0
8 0x000000000059754b module_integrate_mp_integrate_() ???:0
9 0x0000000000597b68 module_integrate_mp_integrate_() ???:0
10 0x0000000000597b68 module_integrate_mp_integrate_() ???:0
11 0x0000000000417fe1 module_wrf_top_mp_wrf_run_() ???:0
12 0x0000000000417f94 MAIN__() ???:0
13 0x0000000000417f22 main() ???:0
14 0x0000000000022555 __libc_start_main() ???:0
15 0x0000000000417e29 _start() ???:0
=================================
forrtl: severe (174): SIGSEGV, segmentation fault occurred
 
My initial thought was that this error had something to do with the surface layer given the backtrace, but I'm still getting segmentation faults even after changing that:

WRF NUMBER OF TILES = 1
WOULD GO OFF TOP: MSKF_PARA I,J,DPTHMX,DPMIN 274 308 NaN 5000.000
[r1n06:697 :0:697] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xfffffffe06af61a4)
==== backtrace (tid: 697) ====
0 0x000000000004d445 ucs_debug_print_backtrace() ???:0
1 0x000000000317cf82 module_cu_mskf_mp_mskf_eta_para_() ???:0
2 0x000000000317289a module_cu_mskf_mp_mskf_cps_() ???:0
3 0x0000000002a82e4b module_cumulus_driver_mp_cumulus_driver_() ???:0
4 0x00000000021b0bc3 module_first_rk_step_part1_mp_first_rk_step_part1_() ???:0
5 0x0000000001705fdb solve_em_() ???:0
6 0x00000000014e5178 solve_interface_() ???:0
7 0x000000000059754b module_integrate_mp_integrate_() ???:0
8 0x0000000000597b68 module_integrate_mp_integrate_() ???:0
9 0x0000000000417fe1 module_wrf_top_mp_wrf_run_() ???:0
10 0x0000000000417f94 MAIN__() ???:0
11 0x0000000000417f22 main() ???:0
12 0x0000000000022555 __libc_start_main() ???:0
13 0x0000000000417e29 _start() ???:0
=================================
forrtl: severe (174): SIGSEGV, segmentation fault occurred
 
There's some weird issues with the vegfrac variable which is where the crash appears to be coming from, it looks like this issue is probably stemming from using NLCD with Noah LSM. Will be trying again using MODIS and Noah MP (which I wanted to use but is incompatible with NLCD), hopefully this will fix the issue.
 
NLCD data only works with SLABSCHEME, LSMSCHEME and PXLSMSCHEME. Also, Note that NLCD data only covers US and no data available in other areas of the world. Please combine NLCD and MODIS when your domain is beyond US, and try with PX or Noah LSM.
 
I was using NLCD + Noah LSM + BEP, which in theory should work (my outer domain is CONUS), but it doesn't really (I did somehow manage to get from May 1 through September 15 using month-long runs and restart files, but it was a pain to set up and troubleshoot and ended up having a cold bias that grew over time becoming massive by July). I switched to MODIS land use (MODIS_30s+30s) and everything is running much more easily (though the weird restart bug where the restart file cannot be created preventing the simulation from continuing when using urban physics if it is much larger than the history interval is still around) and I can even run BEM now. The weird VEGFRA mess still exists (see images) but its running anyway so I'm not sure what's up with that. Still very weird that I was able to get it to run with the previous settings on May 1st but not a later date.

VEGFRA around NYC (NLCD and MODIS appear very similar here, blue is 0%). This pattern also shows up in the outer domains which don't have urban physics turned on and VEGFRA isn't a variable in the met_em files so something is going wrong in real.exe
1675890554883.png
NLCD Landmask (with 444m resolution), MODIS is similar but lower resolution
1675890609484.png

VEGFRA doesn't line up at all with the land use index as in MODIS almost the entire domain is LU=13 (urban and built up) and NLCD has 3 different urban types that in no way match VEGFRA making me wonder where it comes from. Maybe it's not used. Maybe I need to download a fresh install and use that as I had to make a few minor adjustments in /phys to get it to run in the first place (an issue related to NLCD and LCZ maps). Leaf area index (LAI) looks fine with NLCD and is properly tied to land use type in MODIS, but SHDMAX and SHDMIN (I can use Noah MP with MODIS so I have) have the same issues VEGFRA does. This issue can hopefully be fixed by using a LCZ map (which couldn't be done with NLCD until 4.4.2, and even then I think there were still some vegetation issues) which I think I will attempt again as MODIS is lower resolution than I would like and only has one urban type. How does USGS compare?
 
Last edited:
The VEGFRA issue is an urban one (as defined by LU_INDEX or IVGTYP):
d03_weird_urban_vegfrac.png
I've spent most of the day in the code trying to figure out why this is happening to little success. In Noah MP, FVEG is the variable used for vegetation fraction and it is very different from VEGFRA, though it is also wrong- for some reason urban areas have an FVEG of 0.96 (I get the feeling it was supposed to be 0.04 because that makes more sense) and I haven't been able to figure out where that number is coming from either. I would think that FVEG = FRC_URB2D - 1 but I have yet to find a statement that would do something like this. FRC_URB2D is supposed to either be 0.7 or 0.9 for MODIS/USGS IVEGTYP = "is urban" (depending on if LCZs are turned on or not in namelist.input) so VEGFRA/FVEG/SHDFAC (the variable used by Noah LSM for this purpose, unlike FVEG it does not get output) should be 0.1, not 0.96 or the odd mosaic seen in the image above. Any help tracking down this bug would be appreciated.
 
VEGFRA is derived from the static input data. I am not sure whether it is always consistent with landuse type. Depending on various resolutions, there might exist some inconsistency between the two variables.
REAL does do some adjustment to make sure surface data (e.g., soil type, veg type, soil moisture, TSK, etc.) are consistent. However, vegfrac seems not included in the variables for adjustment.
 
Top