Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

WRFV4.0.2 LES-run

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

slmeyer

New member
Hi all,

at the moment I am trying to start a WRF-LES run for a real scenario, with a spatial resolution of 200m using WRF 4.0.2 with the set up as you can see in the attached namelist.

Short overview: 4 Domains (9km, 3km, 1km and 200m) for 15 days in May/June 2018 in Europe, with the inner domain being run with LES.

For setting up the 4th Domain I have tried two different ways, 1.) start a full run starting at D01, using one-way nesting, and 2.) using ndown using the output from D03 as boundary. Both approaches lead to two different errors.

1.) Full run:

For the full run I am starting the attached namelist.input.full_run, after ~20sec run time the run stops with the following output:

Code:
[…]
INITIALIZE THREE Noah LSM RELATED TABLES
d03 2018-05-24_00:00:00 start_domain_em: After call to phy_init
start_em: initializing avgflx on domain   4
d03 2018-05-24_00:00:00 start_em: calling lightning_init
d03 2018-05-24_00:00:00 start_em: after calling lightning_init
d03 2018-05-24_00:00:00 calling inc/HALO_EM_INIT_1_inline.inc
d03 2018-05-24_00:00:00 calling inc/HALO_EM_INIT_2_inline.inc
d03 2018-05-24_00:00:00 calling inc/HALO_EM_INIT_3_inline.inc
d03 2018-05-24_00:00:00 calling inc/HALO_EM_INIT_4_inline.inc
d03 2018-05-24_00:00:00 calling inc/HALO_EM_INIT_5_inline.inc
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source             
wrf.exe            000000000316841D  for__signal_handl     Unknown  Unknown
[...]

Trying to allocate more memory (I am already using ulimit -s to increase the stack size) for the job and decreasing the time step both yield the same result.

2.) NDOWN:

The second way I tried is to first run all Meso-scale Domains (D01 – D03) and afterwards run the 4th Domain creating wrfbdy and wrfinput via ndown looking at the tutorial (http://www2.mmm.ucar.edu/wrf/OnLineTutorial/CASES/NestRuns/ndown.htm)

Creating all meso-scale Domains works fine and give expected results, also running ndown works fine and creates the needed files. Staring wrf for the LES Domain now also crashes after about 5sec of runtime. Everytime at the same point:

Code:
[…]
INPUT LandUse = "USGS"
 LANDUSE TYPE = "USGS" FOUND          33  CATEGORIES           2  SEASONS WATER CATEGORY =           16  SNOW CATEGORY =           24
  returning from of landuse_init
  Do not have ozone.  Must read it in.
  Broadcast ozone to other ranks.
INITIALIZE THREE Noah LSM RELATED TABLES
forrtl: severe (66): output statement overflows record, unit -5, file Internal List-Directed Write
Image              PC                Routine            Line        Source             
wrf.exe            000000000315E2AE  for__io_return        Unknown  Unknown
wrf.exe            00000000031AE67A  for_write_int_lis     Unknown  Unknown
[...]

This holds for the namelist.input.LES as attached to this post. I was trying to use another option for sf_surcafe_physics and setting it to 1 instead of 2 makes the model run till the end. But I have almost no variation of temperature over day (i.e., no diurnal cycle whatsoever, which is unrealistic in this case) in this run, anyways I would like to use the Option used before, so 2.

Has anyone a hint what causes this error and how I could solve it? I would prefer a way using ndown, but getting the full run to work would also be very nice.

Thanks a lot,
Sarah
 

Attachments

  • namelist.input.full_run.txt
    8.1 KB · Views: 210
  • namelist.input.LES.txt
    6.4 KB · Views: 250
Hi Sarah,

I am getting a SIGSEGV error similar to your full job for an LES run on a 250 m domain. I have been trying to diagnose the issue for a month now and am still not sure why it is occurring.

I think it has more to do with the underlying data than anything else because my same configuration that fails in one 30-hr case runs for another with no issue.

Andre
 
Hi Sarah (and Andre),
Grid distances >=200m (up to about 1km) are kind of a "grey-zone", where using an LES option may work, but will not have correctly partitioned resolved/sub-grid energy fractions. They also just may not work at all, depending on several different components. It's actually recommended that instead of trying to run this in LES mode, that you, instead use the Shin-Hong PBL (option 11), which is based on YSU, designed for these particular grid distances. So this could be what is going on; however, a segmentation fault can also be caused by several other factors, such as the number of processors you're using, etc. I would first recommend taking a look at this physics presentation that is given at our WRF Tutorial each year, with a lot of information regarding LES runs:
http://www2.mmm.ucar.edu/wrf/users/tutorial/201901/dudhia_physics.pdf

Perhaps you will want to make modifications to your runs and see if any of the problems are resolved. If not, we can dig a little deeper.
 
Hi Kevin,

I understand what you are saying about the grey zone but I have seen many groups do LES simulations in this range with success. Is the real issue the lack of vertical resolution in the LES domain?

Andre
 
Update: I changed bl_physics to 11 for domains 1-4 and the simulation still failed at a similar spot in the integration with a SIGSEGV.
 
Hi Andre,
Okay, for this new failed run, can you please attach your new namelist along with all of your out/error files (e.g., rsl.error.*). If there are a lot of them, please package them all into one *.tar file. Can you also let me know which version of the model you are running? You mention that this same set-up worked fine for another colleague. Was the only difference between your runs the input data? i.e., were the domains identical, exact same dates, namelists identical, same computing environment, same number of processors, etc.? If so, have you taken a look at your input data to see that everything looks reasonable?

Thanks,
Kelly
 
Hi Kelly,

I am running v3.8.1 on a Cray cluster. Each node has 128 GB of RAM. I ran with the same namelist using all five domains (down to 50m) on a case with no issues. This same case that I am having trouble with also finished with LSM=1 and 4 with no problems.

For whatever reason I cannot upload my tar file at all. I have been trying for quite some time. I don't know if it is on my end or not. the file size is quite small.
 
Andre,
Hmm, I'm not sure why you're unable to attach files here. Is it just hanging, or are you getting an error when trying to do so? If you get an error related to the file extension name, add .txt to the end and see if that helps. You can also take a look at the home page for this forum, and in the top section there is information on uploading to a different source:
http://forum.mmm.ucar.edu/phpBB3/

This is typically only necessary for larger files, but perhaps it will help you to get around your upload problem.

Kelly
 
Hi Kelly,

It must be an issue with the firewall I am behind. I get http errors when uploading on the thread and even on the nextcloud site it gives me problems.

I resigned to running the job piecemeal with debug mode on and hope it will finish the whole run but I am still troubled by the seg faults as I feel they are related to the choice of LSM somehow.

Andre
 
Andre,
I'm sorry you're having trouble with the firewall. That is frustrating. If you are still having problems with this case, and you have another way to get the files to me (e.g., Google Drive, Dropbox, a web page, etc.), let me know!
 
kwerner said:
Hi Sarah (and Andre),
Grid distances >=200m (up to about 1km) are kind of a "grey-zone", where using an LES option may work, but will not have correctly partitioned resolved/sub-grid energy fractions. They also just may not work at all, depending on several different components. It's actually recommended that instead of trying to run this in LES mode, that you, instead use the Shin-Hong PBL (option 11), which is based on YSU, designed for these particular grid distances. So this could be what is going on; however, a segmentation fault can also be caused by several other factors, such as the number of processors you're using, etc. I would first recommend taking a look at this physics presentation that is given at our WRF Tutorial each year, with a lot of information regarding LES runs:
http://www2.mmm.ucar.edu/wrf/users/tutorial/201901/dudhia_physics.pdf

Perhaps you will want to make modifications to your runs and see if any of the problems are resolved. If not, we can dig a little deeper.

Hello,
thanks a lot for your help. A full run with the suggested physics option seems to work for me.

Sarah
 
Top