wrf stops after ndown

kfha · Feb 6, 2024

Hello,

I have run WRF 4.4.1 (with dm+sm) successfully for a 9-3-1 km nested domain. I am now trying to use hourly output for the inner 1 km domain together with ndown to downscale a small part of my inner domain to 333 m resolution. I have successfully created met_em files as well as a wrfinput file for the new 333 m domain using real.exe. I renamed the wrfinput file to wrfndi_d02 and checked that the fields look reasonable. I ran ndown successfully and renamed the ending of the wrfinput and wrfbdy files to d01 before I ran srun ./wrf.exe in a batch script on slurm for my 333 m domain only. After some time (a few minutes to a few hours, depending on how many processors I choose) the job stops, and only the first output file is created. I cannot find any error in the rsl.error files.

The original run was with sst_update=1, but I don't know how to create new wrflowinp files with ndown. As I don't need updated SSTs for this new run, I changed to sst_update=0. I assume this is not part of the problem.

My original 9-3-1 km nested run had a time_step = 30 with a parent_time_step_ratio = 1, 3, 3. When running ndown for my 1 km - 333 m domains, I let time_step = 30 remain the same, but adjusted the parent_time_step_ratio to = 9, 3, which I think should be consistent for the 1 km domain in the original run (because 3x3=9).

Do you have any idea on what's wrong?

Attached are the namelist (for ndown and for wrf afterwards) together with the first rsl.error file.

P.S: My domain contains complex terrain. I have used 3 smooth passes with the 1-2-1 smoothing option. In an earlier simulation, I successfully ran wrf for 3.6-1.2-0.4 km for almost the same region without any cfl errors. As the 0.4 km domain is very similar to my new 333 m domain, I expect it is possible to run my new 333 m domain in a stable way using ndown.

kfha · Feb 13, 2024

Update:

I have now increased debug_level in the namelist to 100 to get more information in the rsl.error file (see updated rsl.error file attached). The simulation seems to stop after calling inc/HALO_EM_B_inline.inc for MYNNPBL. However, I cannot find anything in this file that explains the reason for the crash.

The post below suggests that there might be CFL errors even if it's not written in the rsl.error files:
WRFV4.0.1 model crash with YSU and MYNN PBL options

According to this post, it appears to be a more common problem in versions from v4.0 due to denser spacing of eta levels near the surface requiring shorter time steps. As I am interested in understanding processes near ground, I cannot increase the spacing of eta levels here. I am already smoothing 3 times with the 1-2-1 smoothing option and am unsure if I should smooth even more. My time step is 1 second and I don't think I can have a shorter time step. I am not sure if this is related to CFL errors, and I do not understand how I can disable all wrf_error_fatal calls, as suggested by the post in the link above.

Best regards,
Kristine

Ming Chen · Feb 13, 2024

Kristine,

Your namelist.inut.ndown indicates that this is not really a 'ndown' run because you still run a nesting case with max_dom=2.

After you process 1-km wrfout files, you should be able to create wrfinput and wrfbdy for the 333-m domain, and run a single domain case over this 333-m resolution domain.

Please take a look at the document here, which describes how ndown works.

kfha · Feb 13, 2024

Thank you for your response.

I think I haven't been clear enough in my first post. I am following the documentation you're linking to. My namelist.input.ndown is the namelist I use for real.exe and ndown.exe, where the documentation requires max_dom=2, with the two domains being the 1 km domain and the 333 m domain. After ndown has successfully created wrfinput_d02 and wrfbdy_d02, I rename these two files to wrfinput_d01 and wrfbdy_d01 and run wrf.exe with the namelist labeled namelist.input.wrfafterndown, which has max_dom=1 and settings for the 333 m domain only.

Have I misunderstood something?

Ming Chen · Feb 14, 2024

Thank you for the clarification. You didn't do anything wrong, --- it is my fault not to look at the correct namelist.input.
There are a few issues I am concerned:
(1) run_days = 98
For such a long period simulation, please set sst_update =1
Note that run_days has the top priority and it will overwrite other times specified by start and end time.
(2) for dx= 333.33, the maximum time step is 2 s. Just be a little conservative, please set time step = 1.5 s
(3) with num_land_cat = 25, what is your landuse type input data? For Modis it is 20 or 21, for USGS it is 24.
(4) PBL scheme and LES mode both don't work well for the resolution of 333 m. Probably the SMS-3DTKE scale-adaptive LES/PBL scheme is a better option.
Please keep me updated about the output of your 333m case. We don't have many experiences running WRF at such a high resolution, and your feedback will be helpful for us to better understand the model behavior. Thanks in advance.

kfha · Feb 15, 2024

Thank you for your suggestions.
1) I forgot to update run_days and have now set it equal to 6 in the namelist used for wrf.exe.
I haven't found any description on how to include sst_update for ndown. When running ndown.exe with sst_update=1 (and standard settings for auxinput4 in &time_control), I do not get the wrfinplow files I need for wrf.exe. How does sst_update=1 work for ndown?
2) I now use time_step=1 s in the namelist file for wrf.exe.
3) I use ESA-CCI as landuse input data, which is reclassified to 24-class using convert_geotiff. However, the convert_geotiff algorithm sets category_max in the index file equal to 25, with 25 being missing_value. I therefore need to set num_land_cat = 25 in the namelist. The maximum number for the LU_INDEX in my wrf files is still 24, so it seems like this missing value category is never used. I never had a problem with num_land_cat = 25 in earlier simulations.
4) This is a really useful tips. I tested this scheme, but the model still crashes with similar (lack of) error messages.

Attached is my updated namelist for wrf.exe together with the updated rsl.error file.

kfha · Feb 20, 2024

kfha said:
Update:

I have now increased debug_level in the namelist to 100 to get more information in the rsl.error file (see updated rsl.error file attached). The simulation seems to stop after calling inc/HALO_EM_B_inline.inc for MYNNPBL. However, I cannot find anything in this file that explains the reason for the crash.

The post below suggests that there might be CFL errors even if it's not written in the rsl.error files:
WRFV4.0.1 model crash with YSU and MYNN PBL options

According to this post, it appears to be a more common problem in versions from v4.0 due to denser spacing of eta levels near the surface requiring shorter time steps. As I am interested in understanding processes near ground, I cannot increase the spacing of eta levels here. I am already smoothing 3 times with the 1-2-1 smoothing option and am unsure if I should smooth even more. My time step is 1 second and I don't think I can have a shorter time step. I am not sure if this is related to CFL errors, and I do not understand how I can disable all wrf_error_fatal calls, as suggested by the post in the link above.

Best regards,
Kristine

Do you think CFL errors could be the problem, even if it's not printed in the error files, as suggested by the post I am linking to here? Do you know how to test this by disabling wrf_error_fatal calls (as mentioned in the post)?

kfha · Feb 20, 2024

Update:
I just found three other rsl.error files with some more information at the end:

calling inc/HALO_EM_TKE_5_inline.inc
Program received signal SIGSEGV: Segmentation fault - invalid memory reference

Is there anything specific with this subroutine that could cause a Segmentation fault?

Ming Chen · Feb 20, 2024

CFL errors indicate that the model is numerically unstable. You need to reduce time step, activate w_damping and increase epssm (if your domain is located in high topography area) to make it stable.
Also, please turn off sst_update in your ndown run to see whether the model works.

I don't think that higher debug levels can give more information of the model run. Please set it to 0.

if the model keeps crashing, you may need to recompile RWF in debug mode, i..e, ./configure -D, and rerun the case. The log file will tell exactly when and where the model crashes first, which will give you ideas what could be the reason.

wrf stops after ndown

kfha

New member

Attachments

kfha

New member

Attachments

Ming Chen

Moderator

kfha

New member

Ming Chen

Moderator

kfha

New member

Attachments

kfha

New member

kfha

New member

Ming Chen

Moderator