Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

segmentation error after running WRF for 9 days - where to start looking for the reason

sowa

New member
Dear support team,

I am running WRF (coupled to a snow model) over Antarctica. 1 domain (200x200), 27km resolution, 90sec time step.
The model advances smoothly and writes the output files without problems, until it crashes with exit code 143, or 139 (segmentation fault), approx. 9 days into the simulation. Retrying with increasing epssm of up to 0.4 or reducing itac to 0.1 did not help. w-damping is turned on in all simulations. The time step after it crashes changes by some minutes for the settings I have tried.

I don't know where to start looking for the reason for the segmentation error. Which variable should I look at to find out the reason? As a first step I investigated "W" and found these strange (horizontal) features showing up in levels 20-60 (of 63 levels) (see screenshot). One of them coincides with steep topography, however, the other one does not.
I would appreciate if you could guide me to where to start looking for the reasons for the segmentation fault or guidance on how to narrow it down.

I attach the rsl.error and namelist files.

Thank you for your help!
 

Attachments

  • namelist.input
    12.1 KB · Views: 1
  • rsl.error.0000
    404 KB · Views: 0
  • Screenshot 2024-09-25 at 11.10.14.png
    Screenshot 2024-09-25 at 11.10.14.png
    288.7 KB · Views: 4
  • Screenshot 2024-09-25 at 11.10.40.png
    Screenshot 2024-09-25 at 11.10.40.png
    264.2 KB · Views: 4
There are a number of namelist options that are not included in standard WRF codes. I guess you have modified WRF? Please let me know if I am wrong.
 
hi, the additional namelist options are for the snow model that is using WRF output as input/forcing. The WRF version that is used is ARW-WRF v4.2.1
 
Hi,
Due to limited resources we have in NCAR, we cannot support user modified WRF. But for your case, I have a few suggestions:
(1) please look at your rsl files and find possible error messages. Note that the errors can occur randomly in any rsl files. Thereby you need to check all of them.
(2) please save wrfout at the time right before the model crashed. Then look at these wrout files, figuring out when and where the first NaN appears.
(3) If necessary, you may need to compile the code in debug mode, i.e., ./configure -D, then rerun this case (probably you can restart from a time shortly before the model crash?) . With the debug mode, the log file will tell exactly when and where somethimg goes wrong first.
Hope this is helpful for you.
 
Top