Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

(RESOLVED) STOP in Noah-MP: emitted longwave <0; skin T may be wrong due to inconsistent input of SHDFAC with LAI

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

nalamirew

New member
Dear all

I am using WRF-4.2.1 CMIP6 dynamical downscaling at a 25km horizontal resolution. I use 4 different GCM models for my project and I was able to run WRF for some years(~15). However wrf.exe is recently crushing for a reason which is not clear. Below is the last lines error message I get.

LBC for restart: Starting valid date = 2032-01-10_06:00:00, Ending valid date = 2032-01-10_12:00:00
LBC for restart: Starting valid date = 2032-01-10_12:00:00, Ending valid date = 2032-01-10_18:00:00
LBC for restart: Starting valid date = 2032-01-10_18:00:00, Ending valid date = 2032-01-11_00:00:00
LBC for restart: Starting valid date = 2032-01-11_00:00:00, Ending valid date = 2032-01-11_06:00:00
LBC for restart: Found the correct bounding LBC time periods for restart time = 2032-01-11_00:00:00
Tile Strategy is not specified. Assuming 1D-Y
WRF TILE 1 IS 39 IE 76 JS 1 JE 21
WRF NUMBER OF TILES = 1
d01 2032-01-11_06:00:00 Input data is acceptable to use:
emitted longwave <0; skin T may be wrong due to inconsistent
input of SHDFAC with LAI
46, 11 SHDFAC= 0.483922571 VAI= 5. TV= NaN TG= NaN
LWDN= 385.429077 FIRA= NaN SNOWH= 0.
-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE: <stdin> LINE: 2102
STOP in Noah-MP
-------------------------------------------

I have noticed this was asked previously on this platform but couldn't see the solution. I am posting it again with the hope someone who possibly have experienced similar issue will give a solution.

regards,
Netsanet
 
Hi,
You mention that you've run many simulations without any problems. Was there any difference in the setup/namelists of those simulations besides the dates of the run? Were all the physics the same?

I notice the error message states:
Code:
TV= NaN TG= NaN
It may be necessary for you to do some debugging to determine where these NaN values are coming from for these 2 particular variables.
 
I haven't made any changes apart from the dates. It is a continuous run and crushes at a particular date.
I will check where the NaNs are from by increasing the debug level in the namelist. But I would appreciate if you have a suggestion with any specific method to figure out what is causing the NaNs.

thanks
Netsanet
 
Hi,
I would not recommend increasing the debug_level setting. This rarely provides any useful information, and will just add a lot of extra junk to your rsl files. Set that back to 0.

You should start with the file where the error originates, which is phys/module_sf_noahmplsm.F, line 2096 in v4.2.1. Above that, you'll see the write statements for TV and TG. You'll need to look above to see where that is coming from - it may be called in from a different subroutine (which may be in a different file). You can put some print statements in the code in different places to see where the value goes bad. You may also need to check your input data to make sure nothing is missing or bad there. If you're unfamiliar with print statements in Fortran code, you would use the syntax:
Code:
print *, "value of TV at location 1 = ", TV
You can put anything you want in the quotes. I often put my initials so that I can find it quickly in the rsl files. Once you modify the code with prints, you'll need to recompile, but you do NOT need to clean the code or reconfigure since you aren't modifying the registry. You simply recompile and it should be much quicker than a standard compile. You can then run tests and check the values in your rsl.out* files.
 
Hi Kwerner
Thank you for your suggestions.
I have made changes with the code and compiled and run wrf. Here are the changes I made on the code(attached).

On line 2051
IF(ILOC.EQ.46 .AND. JLOC.EQ.11)print *, ILOC, JLOC, 'CHECKING NaN:','fveg=',fveg,'irg=',irg,'irb=',irb,'irc=',irc

on lines 1122, 2046, 3802
print *, "value of TV at location 1 = ", TV

on line 1248
print *, "value of TV at location 2 = ", TV

Also attached is rsl.out.0001 file which has the error message.

Any idea why why my simulation is crushing?
regards,
Netsanet
 

Attachments

  • module_sf_noahmplsm.F
    453.8 KB · Views: 46
  • rslout.txt
    2.1 KB · Views: 59
Thanks for sending those. I'm still looking into this and haven't figured anything out yet.
1) Out of curiosity, if you run a restart simulation, starting just before the time it crashes, are you able to get past this, or do you see the same problem? I expect you will still see it, but I just wanted to check.
2) Can you attach your namelist.input file so that I can take a look at that?
Thanks!
 
Hi Kwerner

1. I actually did restart just before time model fails, the problem persists as you expected.

2. Please find attached namelist.input

thank you
Netsanet
 

Attachments

  • namelist.input
    5 KB · Views: 73
Hi,
I am suspicious that this case went wrong before it eventually crashed.
Can you look at your wrfrst file, find the point where TV= NaN TG= NaN, and check relevant variables (T2, HFX, QFX, U10, V10, SOILT, SOIM, etc) at this point? Please let me know whether they all look reasonable.
 
Hi Ming

thank you for your suggestions. I have checked the wrfrst files and wrflowinp files and there is nothing unusual in them.
It is so annoying that I couldn't figure out what is wrong but the model keeps failing. Any other suggestion please?
regards,
Netsanet
 
Dear all

Out of desperation, I was trying to run the model at different history interval. I normally run the model at 6hr frequency. When I change history interval to each time step, 150 sec, the model fails immediately after writing the first output. i.e 2.5 sec. Also when my history interval is 6 hrs with everything else kept the same, the model run fails immediately after writing the first time step, i.e after 6hrs. Finally when I run the model with history interval changed to 1 day, again with everything else kept the same, the model continues running without failing. So I was wondering how history interval is affecting the model run and secondly will it be an issue if I continue my run with 1 day history interval(as anyway I need daily variables at the end of my long simulation)

regards,
Netsanet
 
After a lot of debugging, I realized that this problem is caused by time step(dt) becoming zero in the surface modules. Note: I have adaptive time stepping switched on. When this is turned off model runs with no problem. So if others are having same issue this might be one thing to look into.

regards,
Netsanet
 
Hi,
Glad to see that you solved this problem, I saw in the reply that you need to get the simulated data of the daily variable at the end, I would like to know what is the table_id and frequency of the CMIP6 model data you have chosen, 6-hourly , daily or monthly? I'm very confused by this question, I would be grateful if you could reply. :)
 
Top