Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

WRF simulation hangs at end of run and doesnt complete

fisky

New member
Hello,

I'm running a 100 x 100 grid simulation at 1km resolution for a 24 hour period using GFS 0.25 degree input for a forcasting application using WRF-ARW. The model hangs at the end and doesn't complete and I need to 'CTRL-C' to get out of it. All the rsl_out and rsl_error files say 'WRF completed succesfully' at the end of the file and the wrfout file is produced. The file seems corrupt though.

I'm using s a standard build with default physics. I'm using an AWS EC2 instance with 16 virtual CPU's and running in parallel using 15 of them. The CPU's seem to max out however memory usage doesn't seem excessive with plenty left in the tank.

I have run the same domain using the same input data (0.25º GFS) at 5km resolution (20 x 20 grid) on the same system in parallel using 4 CPU's and it ran succesfully.

I've attached the namelist.input file.

Any advice would be greatly appreciated.

Cheers,

Andrew
 

Attachments

  • namelist.input
    3.7 KB · Views: 2
Update:

In fact my SSH session to the EC2 instance gets shut down. I've succesfully run the simulation at 5, then 3, the 2km resolutions prior to this run at 1km. After trying to run at 2 second timestep I get the follwing after 2 hours:

starting wrf task 0 of 15
starting wrf task 1 of 15
starting wrf task 2 of 15
starting wrf task 3 of 15
starting wrf task 4 of 15
starting wrf task 5 of 15
starting wrf task 6 of 15
starting wrf task 7 of 15
starting wrf task 8 of 15
starting wrf task 9 of 15
starting wrf task 10 of 15
starting wrf task 11 of 15
starting wrf task 12 of 15
starting wrf task 13 of 15
starting wrf task 14 of 15
client_loop: send disconnect: Broken pipe
andrewfisk@Andrews-MBP ~/aws_cloud [255]>
 
Hi,
The issues you're having are very likely related to the fact that you're using 0.25 degree input and nesting straight down to a 1km resolution domain. This is an ~28:1 ratio, which is much greater than you should use. We recommend nothing more than about a 7:1 ratio from the input data to the first WRF domain. If you want to use a 1km domain, you will need to add a parent domain around it so that the transition between resolutions is more smooth. You can try something like a dx = 5000, 1000 and use a 5:1 parent_grid_ratio.
 
Hi,
The issues you're having are very likely related to the fact that you're using 0.25 degree input and nesting straight down to a 1km resolution domain. This is an ~28:1 ratio, which is much greater than you should use. We recommend nothing more than about a 7:1 ratio from the input data to the first WRF domain. If you want to use a 1km domain, you will need to add a parent domain around it so that the transition between resolutions is more smooth. You can try something like a dx = 5000, 1000 and use a 5:1 parent_grid_ratio.
Hi kwerner,

Thanks for your response. I'll give that a go.

Cheers!
 
Hi,
The issues you're having are very likely related to the fact that you're using 0.25 degree input and nesting straight down to a 1km resolution domain. This is an ~28:1 ratio, which is much greater than you should use. We recommend nothing more than about a 7:1 ratio from the input data to the first WRF domain. If you want to use a 1km domain, you will need to add a parent domain around it so that the transition between resolutions is more smooth. You can try something like a dx = 5000, 1000 and use a 5:1 parent_grid_ratio.
Hi kwerner,

I've done as you advised and nested the 1000 dx domain inside a 5000 dx domain. The simulation runs for around two hours then hangs and I get the message 'client_loop: send disconnect: Broken pipe' and my SSH session to the AWS EC2 instance is terminated.

If I log back in it appears the model ran succesfully and I have usable wrfout files for both domains. All the rsl_error and out files note a succesful completion of the run. It seems the model isnt shutting down cleanly?

I've attached my namelist.input file in case that is of interest...
 

Attachments

  • namelist.input
    3.7 KB · Views: 0
Hi,
It sounds to me that your AWS session is simply expiring and logging you out, but that WRF is continuing to run on the instance - probably because you sumitted the job through a scheduler? If that's the case, you don't have to be active in your session for the model to continue running. If all the output is available, complete, and looks reasonable, and if the rsl files indicate the model finished correctly, then there shouldn't be any problem with the WRF simulation.
 
Hi,
It sounds to me that your AWS session is simply expiring and logging you out, but that WRF is continuing to run on the instance - probably because you sumitted the job through a scheduler? If that's the case, you don't have to be active in your session for the model to continue running. If all the output is available, complete, and looks reasonable, and if the rsl files indicate the model finished correctly, then there shouldn't be any problem with the WRF simulation.
Hi kwerner,

Thanks. I changed the config in my ssh to send a packet every 60 seconds so the connection stays active and it solved the problem.

Thanks for your help.

Andrew
 
Top