Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

WRF long simulation fail

Israt246

New member
Hello, I am running 13 months WRF simulation from 2013-08-01 00 UTC to 2014-09-01 00 UTC. The WRF version is 4.2.2 and the WPS version is 4.2. . I am running WRF on NCAR HPC Derecho and using the precompiled WRF on Derecho. My input data to WRF is ERA5 and I am using the Restart option. The model ran without any problem initially and gave me the last restart file on 2014-06-17_00:00:00 (restart files created after every 40 days).

Now when I restart the model with start date 2014-06-17_00:00:00, the model kept running until reaching the wall time of 12 hours but generated output files only until 2014-06-27_00:00:00. After that, no new output files were generated. I re-ran the model and the outcome was the same the second time, that is, no output files after 2014-06-27_00:00:00 even though the model ran for 12 hours (full wall time). I tried this command: grep cfl rsl* and there was printed nothing related to CFL, but I did see something related to segmentation fault in rsl.error.0000 file.

I checked the storage and I have used 5.85 TiB but the allowed storage is upto 30 TiB. I have attached my nameless.input file, rsl.0000 file and the bash script that I have used to run wrf.exe. I would greatly appreciate any suggestions.
-Israt
 

Attachments

  • WRF_files.zip
    760.6 KB · Views: 0
Last edited:
Please tell me where your case is located in derecho. I will take a look and get back to you. Thanks.
 
It seems that the files located at /glade/derecho/scratch/ijahan/era5_wrf_run/wrfout4.2.2 are overwritten today.

Did you rerun this case and somehow manage to get it done?
 
It seems that the files located at /glade/derecho/scratch/ijahan/era5_wrf_run/wrfout4.2.2 are overwritten today.

Did you rerun this case and somehow manage to get it done
Yes, I tried to re-run the case by modifying my restart interval to 7 days from 40 days. I can see that the job is running and same as my previous two attempts, no output files are being created after wrfout_d02_2014-06-27_00:00:00. I am monitoring the rsl.error.0000 file right now and it is just hanging at the following line for quite some time now:
Timing for main (dt= 12.00): time 2014-06-27_23:13:14 on domain 2: 0.08737 elapsed seconds

The last output file wrfout_d02_2014-06-27_00:00:00 was created around after 3 hours since the job started running (wall time 12 hours). I would greatly appreciate your help in this matter @Ming Chen
PS: The job failed same as the last two attempts after reaching the wall time. The rsl.error.0000 file shows forrtl: error (78): process killed (SIGTERM)
 
Last edited:
I didn't see any error message in your rsl files. There might be some other reasons for the failure of this case.
Please tell me where your run script is located (the script file you submit to run this case). Thanks.
 
Hi Ming, the script that I use to run wrf.exe is located in the same directory. It's named as submit_wrf_wrf
/glade/derecho/scratch/ijahan/era5_wrf_run/wrfout4.2.2
 
Something is wrong with time in wrflowinp_d02. In your rsl file, I found the message:

Code:
**WARNING** Time in input file not equal to time on domain **WARNING**
 Time in file: 2014-06-16_18:00:00
 Time on domain: 2014-06-17_00:00:00

Although this is just a warning message, I am concerned that the data might be messed up due to the wrong time

Furthermore, sst_update is an option for the outermost domain when a nested case is run. Therefore, I don't think it is necessary to use wrflowinp_d02.

Can you remove wrflowinp_d02 and rerun this case? Please let me know whether it works. Thanks.
 
Okay, I will remove the wrflowinp_d02. Just a few questions if you could please clarify:

1. You meant re-running wrf.exe with the latest restart file (not starting fresh from the beginning of the simulation), right?

2. The WRF model is forced with ERA5 data and ERA5 SST data is updated every 24 hours. But I am not using any external SST data. I found in the user guide"For long simulations, the model provides an alternative to read-in the time-varying data and to update these fields. In order to use this option, one must have access to time-varying SST and sea ice fields."

Can I still use set_update when I have time varying ERA5 SST data (but not using any other external SST data) or is this option only used when using an external SST data?
I will update as soon as I re-run by removing wrflowinp_d02.
-Israt
 
Last edited:
Hi @Ming Chen,
I tried to re-run wrf.exe by removing wrflowinp_d02 and it failed with the following message in rsl.error.0000:
FATAL CALLED FROM FILE: <stdin> LINE: 314
Possibly missing file for = auxinput4

I have uploaded my rsl.error.0000 file.

Do you have any other suggestions?
 

Attachments

  • rsl.error.0000
    133.2 KB · Views: 1
Last edited:
Please see my answers below:
Okay, I will remove the wrflowinp_d02. Just a few questions if you could please clarify:

1. You meant re-running wrf.exe with the latest restart file (not starting fresh from the beginning of the simulation), right?
Yes that is correct.
2. The WRF model is forced with ERA5 data and ERA5 SST data is updated every 24 hours. But I am not using any external SST data. I found in the user guide"For long simulations, the model provides an alternative to read-in the time-varying data and to update these fields. In order to use this option, one must have access to time-varying SST and sea ice fields."

Can I still use set_update when I have time varying ERA5 SST data (but not using any other external SST data) or is this option only used when using an external SST data?
It is correct that you can use ERA5 SST data for the option of sst_update. External SST data can also be used if the data is of high-quality.
I will update as soon as I re-run by removing wrflowinp_d02.
-Israt
 
The case failed at the right beginning of the restart. This is because in your wrfrst file, sst_update = 1 for all domains. This option from wrfrst has a higher priority than that from namelist.input, --- sorry that I am not aware of this priority issue.
In this case, I guess you have to run this case from the initial time. Please let me know if you still have problems.


Hi @Ming Chen,
I tried to re-run wrf.exe by removing wrflowinp_d02 and it failed with the following message in rsl.error.0000:
FATAL CALLED FROM FILE: <stdin> LINE: 314
Possibly missing file for = auxinput4

I have uploaded my rsl.error.0000 file.

Do you have any other suggestions?
 
The case failed at the right beginning of the restart. This is because in your wrfrst file, sst_update = 1 for all domains. This option from wrfrst has a higher priority than that from namelist.input, --- sorry that I am not aware of this priority issue.
In this case, I guess you have to run this case from the initial time. Please let me know if you still have problems.


Hi @Ming Chen,
Hi Ming,
1. When I re-ran by removing the wrflowinp_d02, I did not make any change to my namelist.input file, so it still had sst_update=1. When you suggested re-running by removing wrflowinp_d02, did you mean to turn off the sst_update for domain 2 in my namelist.input first and then re-run by removing wrflowinp_d02?
2. Are you suggesting to run WRF now from the very beginning by turning off sst_update for domain 2 so that I don’t have the wrflowinp_d02 file? I am concerned about this approach since WRF outputs from 2013-08-01( sim start date) until 2014-06-27 were created just fine with the wrflowinp_d02 file being present. The problem seems to be related to when the model advances to 2014-06-28 hour. So is it possible that the model is failing for other reasons? For example, I am using adaptive time step and did not use w_damping.
Best,
Israt
 
Top