Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Run stops suddenly without any error.

MKG

New member
Hi my folks,

I am trying to run a one-week simulation, nudged with ERA-5. I successfully created initial, lateral, and fdda files.
The run also successfully starts. However, the simulation stops in a few days.
I tried a continued run, but it stopped at the same step.

Does anyone give advice to me for fixing this error?
I have attached namelist.input and rsl.error files.

Thank you for you help,
Masaru
 

Attachments

  • rsl.error.0000
    1.5 MB · Views: 5
  • namelist.input
    4.2 KB · Views: 4
Hi,
I can't say for sure if this is the problem, but your domains are 274x274 and you're only using 11 processors. Is it possible for you to use more than that? You could probably use a few hundred and be okay. Additionally, because 11 is a prime number, the decomposition of processors is 1 in the x direction and 11 in the y direction, which is not ideal. It doesn't have to be perfectly squared, but closer to a square is probably better. If you're able to use more, try something like 100 (10x10) and see if that gets the simulation further. If not, please package all of your rsl files into a single *.tar or zipped file and attach that. Thanks!
 
Hi kwerner,
I thank you for your advice. I tried 100 processes, but it stopped. I tried 80 and 120 processes but it stopped, as well.
I have attached the zipped rsl files.

Thank you again,
Masaru
 

Attachments

  • rsl.zip
    595.1 KB · Views: 3
Last edited:
Thanks for sending those, and for testing with additional processors. Unfortunately the rsl files didn't have much helpful information. You mentioned that you tried to do a restart, but it failed at the same time. I would recommend trying to get a restart file to output at a time close to (but right before) the time the model always stops. Then you can run some tests using that restart file and you don't have to run for so long before it stops. I would recommend these tests first:
1. Try running with a single domain to see if the issue is related to the nest.
2. Try running this with the default namelist (I've attached it here in case you don't still have the copy), of course changing the settings for the domain size, resolution, dates, etc., but keeping everything else as-is (i.e. don't use nudging, use the default physics options, nothing extra) so see if that makes any difference. If so, then you can start slowly adding back in some of your options to see which is the culprit.
 

Attachments

  • namelist.input.txt
    3.7 KB · Views: 1
I have attached the zipped rsl files.
rsl.error.0069 shows that you have got a segmentation fault. In vast majority of cases it is caused by the numerical instability in the model, that is, violation of CFL criteria.

Solution in many cases: within &dynamics section add epssm and set value to around 0.5. If it still doesn't work, reduce time step until it works...

Hope this helps :)
 
Top