Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

./real.exe stopped unfinished problem

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

lslrsgis

Member
Dear WRF Community,

I am simulating 2001 one year using GFS/FNL as boundary and initial condition. When runing ./real.exe to generate wrfinput* and wrfbdy*, it stopped at 2001-12-23-00:00:00 loop 1423/1457 for domain01.

I am using dmpar compiled real.exe, and use the command: mpirun -np 8 ./real.exe. The met_em* files can be opened by ncview, and seems to be complete.
------------------------------------------------------------------------------------------------------------------------------------------------
d01 2001-12-22_18:00:00 No average surface temperature for use with inland lakes
Assume Noah LSM input
d01 2001-12-22_18:00:00 Timing for processing 0 s.
d01 2001-12-22_18:00:00 Timing for output 0 s.
d01 2001-12-22_18:00:00 Timing for loop # 1422 = 0 s.
d01 2001-12-23_00:00:00 Yes, this special data is acceptable to use: OUTPUT FROM METGRID V4.0
d01 2001-12-23_00:00:00 Input data is acceptable to use:
metgrid input_wrf.F first_date_input = 2001-12-23_00:00:00
metgrid input_wrf.F first_date_nml = 2001-01-01_12:00:00
d01 2001-12-23_00:00:00 Timing for input 0 s.
d01 2001-12-23_00:00:00 flag_soil_layers read from met_em file is 1
Using sfcprs3 to compute psfc

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0 0x7F5B652F8697
#1 0x7F5B652F8CDE
#2 0x7F5B63E5C27F
#3 0x7F5B63F7B76F
#4 0x7F5B653C3A93
#5 0x6F76D4 in wrf_message_.part.0 at module_wrf_error.f90:?
#6 0x43117E in __module_initialize_real_MOD_lagrange_setup
#7 0x434389 in __module_initialize_real_MOD_vert_interp
#8 0x45CE57 in __module_initialize_real_MOD_init_domain_rk
#9 0x4791A1 in __module_initialize_real_MOD_init_domain
#10 0x418828 in med_sidata_input_
#11 0x419FB6 in MAIN__ at real_em.f90:?
------------------------------------------------------------------------------------------------------------------------------------------------

Would anyone tell me why it stopped? The namelist.input met_em files and log files are attached. Thanks.

LSL
 

Attachments

  • rsl.package.tar.gz
    20.2 MB · Views: 55
  • namelist.input
    4.3 KB · Views: 60
  • met_em.package.tar.gz
    11.4 MB · Views: 50
Hi,
Could it be possible that you've run out of disk space in the directory/disk where you are trying to write the wrfinput/wrfbdy files? Another possibility could be the file size. The program can't write out files larger than 4 GB, so if the wrfinput_d0* and wrfbdy_d01 files are approaching that size, that could be the issue. It would be worth a test to run from a few days before the crash, through to the end of the simulation period to see if you're able to do that. If so, then you know it's not a data problem, but more of a size or disk problem.
 
Thanks for the reply.

(1) A 10day ./real.exe test simulation overriding this day (2001/12/21->2001/12/31) can be successfully done. wrfinput_d01 wrfinput_d02 and wrfbdy_d01 are generated.

(2) <a>Another whole year 2015 seems OK with wrfbdy* wrfinput* generated. Among these, wrfbdy_d01 excess 2.3G.
[sliu@manager em_real_1y_2015_DEFAULT]$ ls -hl wrfinput_d0* wrfbdy*
-rw-r--r-- 1 sliu user 2.3G 10月 25 08:54 wrfbdy_d01
-rw-r--r-- 1 sliu user 6.6M 10月 25 08:45 wrfinput_d01
-rw-r--r-- 1 sliu user 13M 10月 25 08:54 wrfinput_d02

<b>Comparatively, for the whole year 2001 simulation: the broken wrfbdy_d01 are 2.2G only.
-rw-r--r-- 1 sliu user 6.6M 10月 29 03:05 wrfinput_d01
-rw-r--r-- 1 sliu user 2.2G 10月 29 03:15 wrfbdy_d01
Since complete file size for wrfinput_d01 is 2.3G, it seems strange when it stopped near the end of the year.

Note: export WRFIO_NCD_LARGE_FILE_SUPPORT=1 are active in .bashrc.

(3) I deleted the extra files to leave enough space for the ./real.exe simulation.
-The simulation for another year 2015 can be done without complain of no space;
-This simulation for this year 2001 has been done several times, and the situation is the same: stopped at 2001-12-23 00:00:00.
 
Two additional tests were conducted:

(1) 2001-01-02_12:00:00 -> 2001-12-31_12:00:00 (one day postponed as original one starting from 2001-01-01_12:00:00). The ./real.exe program stopped at
2001-12-26_12:00:00.0000, which is loop #1433 out of 1453
-----------------------------------------------------------------------------

Domain 1: Current date being processed: 2001-12-26_12:00:00.0000, which is loop #1433 out of 1453
configflags%julyr, %julday, %gmt: 2001 360 12.0000000
d01 2001-12-26_12:00:00 Yes, this special data is acceptable to use: OUTPUT FROM METGRID V4.0
d01 2001-12-26_12:00:00 Input data is acceptable to use:
metgrid input_wrf.F first_date_input = 2001-12-26_12:00:00
metgrid input_wrf.F first_date_nml = 2001-01-02_12:00:00
d01 2001-12-26_12:00:00 Timing for input 0 s.
d01 2001-12-26_12:00:00 flag_soil_layers read from met_em file is 1
Using sfcprs3 to compute psfc
d01 2001-12-26_12:00:00 No average surface temperature for use with inland lakes
Assume Noah LSM input
points artificially set to land :

(2) 2015-01-01_12:00:00 -> 2015-12-31_12:00:00. All things go well, wrfbdy_d01, wrfinp_d01, wrfinp_d02 were generated. among these, wrfbdy_d01 ~2.3G.
(namelist.input screenshot attached for 2001, 2015 two simulations).

THANKS.
 

Attachments

  • Screenshot from 2019-11-02 20-38-29.png
    Screenshot from 2019-11-02 20-38-29.png
    708.3 KB · Views: 1,636
  • Screenshot from 2019-11-02 20-39-44.png
    Screenshot from 2019-11-02 20-39-44.png
    638.9 KB · Views: 1,634
Hi,
Because the model is able to run past the times that it stops when using a different date selection, this does not seem to be a problem with the data or the namelist, and points to your system or environment as being the cause. Unfortunately that is something that you will need to discuss with your systems administrator. In the meantime, you could run real.exe twice and when running wrf.exe, you can use restart files to start your run at the beginning of the second period of time. For e.g.:
1) run real.exe from 1/1/2001 - 6/30/2001
You will get wrfbdy_d01, wrfinput_d01, and wrfinput_d02 files. Save those in a different directory, or as a different name, to keep them from being overwritten.
2) run real.exe from 7/1/2001 - 12/31/2001
You will get the same output files, but they will be for this time period. Again, save those elsewhere (or as a diff name).
3) Run wrf.exe for however long you wish to run at one time (up to the end of June), but make sure to output restart files at an interval that would ensure that you will have one for 7/1/2001.
4) Run wrf.exe as a restart (restart = .true.) starting at 7/1/2001, using your wrfrst* file from that time, along with the wrfbdy_d01 file for that time.

A couple of notes:
1) Most people will output a restart file more often than every 6 months - something more like daily, weekly, monthly, etc.
2) Since you are running for a long time, you really should be using the sst_update option, along with some higher-resolution SST input data. We recommend this for anyone running longer than about 1 week.
3) Your d01 domains are too small. We always recommend having your domains at least 100x100 to be able to resolve anything realistic. Take a look at this link for guidance on setting up a reasonable domain: http://www2.mmm.ucar.edu/wrf/users/namelist_best_prac_wps.html
 
Thanks for the detailed guide. I have tried the restart solution, and it works. Now ./real.exe ./wrf.exe can complete for one whole year.

The notes are critical. I will modify the namelist to change the outer domain size and add SST.
 
Top