Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Model run crashes over India (v4.5)

obrousse

New member
Hi everyone,

I have been trying to run real case scenarios (12 km -- 3 km -- 1km) over the city of Bhubaneshwar following some idealised simulations I ran over this city. I try to deviate as little as possible to my idealised simulations configuration but I am facing issues. The model runs only for 2 seconds and then crashes calling SIGSEV. I am running the model compiled in gfortran over 36 CPUs -- I would have the option to run on another machine with more CPUs if needed.

Some basic information:
  • Model is forced using ERA5 data at 6 hourly time steps
  • Urban classes are based on the global LCZ embedded in WRF
  • BEP-BEM activated with Bougeault Lacarrere
  • NOAH-MP is used instead of NOAH-LSM
  • Initial conditions are also derived from ERA5 including orography
I tried running the 1 km domain directly nested into the ERA5 data and it would run slightly longer but would still crash within the first modelled hour (at ~30 minutes).
Several tests were already perpetrated:
  1. diff_opt turned from 2 to 1
  2. hybrid_opt turned from 2 to 0
  3. essm changed to multiple increasing values
  4. smoothing of the mountain vs no smoothing
  5. timestep reduced from 30 sec to 5 sec
  6. changed eta_levels from 55 to 45
I hereby attach the namelists for WPS and for WRF. real.exe works fine, the problem is whilst running wrf.exe .
The error message with debug_level = 1000 is as follows:
d03 2024-05-28_18:00:00+05/12 DEBUG wrf_timetoa(): returning with str = [2024-05-28_18:00:00]
d03 2024-05-28_18:00:00+05/12 call radiation_driver
d03 2024-05-28_18:00:00+05/12 Top of Radiation Driver
d03 2024-05-28_18:00:00+05/12 calling inc/HALO_PWP_inline.inc
d03 2024-05-28_18:00:00+05/12 call surface_driver
d03 2024-05-28_18:00:00+05/12 in SFCLAY

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Kind regards,
Oscar
 

Attachments

  • namelist_Bub_Oscar.zip
    3 KB · Views: 1
Hi Oscar,

Are you running this with multiple processors? Do you have rsl* files you could share? If so, please package them all together in a single *.tar or zipped file and attach that. Thanks!
 
Hi! Sorry for the late reply, please find attached the rsl files. I do not think there is any useful information in there. One thing we are investigating is the ptop level. In our idealised simulations we used a ztop of 16000 to capture the overshooting tops and the heights above the Himalayas but this also seems to create a lot of instabilities.

I am not sure what is the best practice for such heights. It would mean having a ptop at about 300 according to our recent calculations.
 

Attachments

  • rsl_files_OB.tar.gz
    33.1 KB · Views: 1
Thanks for sending that. A few of the rsl files show this error, with a backtrace:

Code:
Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0  0x2aeac03763ff in ???
#1  0x195deba in __module_sf_bep_bem_MOD_bep1d
        at /home/ucbqocb/Source/wrf/builds/WRF_OSCAR_IDEAL_ANDREA/phys/module_sf_bep_bem.f90:2020

So it looks like the issue is happening around line 2020 in module_sf_bep_bem.f90. You may want to try putting in a couple of print statements to see if you can track down the issue. When you do that, you'll need to find the corresponding location of that code in the module_sf_bep_bem.F file and make the edits there, instead of in the .f90 file. You will then have to recompile the code, but you don't need to issue a 'clean -a' nor reconfigure. You should be able to just compile and it should be quicker than when you first compiled. After it compiles, run wrf again and check for your prints in the rsl* files. You may have to do this a few times to track down where the problem is coming from.
 
Top