Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Unknow error when running real.exe

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

cococha

New member
Hi there,

The following error appears in one of my rsl.error files when running real.exe:

Code:
 $ tail -n 50 rsl.error.0000

 date 1999-01-02_13:00:00
 ds            1           1           1
 de          139          33           5
 ps            1           1           1
 pe           14          33           5
 ms           -4           1           1
 me           21          33           5
module_io.F: in wrf_write_field
real.exe: posixio.c:293: px_pgout: Assertion `*posp == ((off_t)(-1)) || *posp == lseek(nciop->fd, 0, 1)' failed.
forrtl: error (76): Abort trap signal

I can not find any similar error on the forum.

I am running WRF with a very simple configuration forced by ERA5 on pressure levels. real.exe seems to work over a random number of time-steps an then fails with the previous error.

The output above comes from a run over 100 cores with 6GB of memory each.

Any ideas?

Thanks!
 
Hi,
Can you attach the namelist.input file you are using for this, and package up the rsl.error.* files into a single *.TAR file and attach that, as well? Thanks!
 
Sure, here it is. As I am doing some tests my current namelist is pretty standard...

thanks
 

Attachments

  • namelist_rsl.tar.gz
    47.5 MB · Views: 60
Hi,
I'm not sure exactly what is causing this error (that is a bit crypic), but I can make some suggestions that may help.
1) Set debug_level = 0. This is an option that we have removed from the default namelist in recent versions of the code. We have found that it rarely provides any useful information and just adds a lot of junk to the rsl* files, making them difficult to read, or sometimes so large that they take up all the available disk space.
2) The real.exe process doesn't really need many processors. I typically only use 1 processor for a domain size of about 100x100. 100 processors is way too many. Try something between maybe 4 and 15 processors.
3) I also notice that you have your output maybe going to a different directory. Check to make sure that you have permission to write to that directory, and that you have enough space there.

These suggestions may not solve the problem, but at the least, we will know that the current set-up isn't causing the problem. If you continue to have problems, please attach your new namelist.input and your new rsl* files. Thanks!
 
Hi,

I have done some tests setting debug_level = 0 in the namelist. I have run real.exe increasing sequentially the number of cores. Attached you can find the rsl files from a single to 10 processors.
My output (wrfbdy_d01and wrfinput_d01), is generated in the same directory of ./real.exe, I think this is not the problem.

Is there any way to generate the real.exe outputs in another directory? Maybe something like history_outname for wrf.exe outputs. Just to check if it is a problem with the disk, maybe is somehow corrupted
Thanks!

Update 1: I have done a link of the whole WRF directory to another disk and the error persists...

Update 2:I have done a 10 core test increasing the memory until 15GB per cpu. it seems to work until the second domain. Actually real.exe runs successfully removing the second domain (see namelist_rsl_8core_15GB_SUCCESS.tar.gz). After that I have run real.exe with different combinations of processors without memory limit, but it always crash at the beginning of the second domain. I am attaching too the rsl files of the 10core 15GB test. Any recommendations?
 

Attachments

  • namelist_rsl_2core.tar.gz
    17.8 KB · Views: 55
  • namelist_rsl_4core.tar.gz
    22.6 KB · Views: 53
  • namelist_rsl_6core.tar.gz
    50.8 KB · Views: 50
  • namelist_rsl_8core.tar.gz
    61.4 KB · Views: 51
  • namelist_rsl_single.tar.gz
    4.6 KB · Views: 50
  • namelist_rsl_10core.tar.gz
    292.4 KB · Views: 49
  • namelist_rsl_8core_15GB_SUCCESS.tar.gz
    198.5 KB · Views: 50
  • namelist_rsl_10core_15GB.tar.gz
    114.3 KB · Views: 52
Hi,
Can you attach a couple of time periods of met_em* files for d01 and d02 so that I can try to test this on my system? Package all the files together into a single *.TAR file. If that file is too large to attach, see the home page of this forum for instructions regarding sending large files. Thanks!
 
Hi,

Here they are the met_em files of the first 24 hours of both domains.I have add the namelist.wps too


Thanks!
 

Attachments

  • wps_files.tar
    69.1 MB · Views: 62
Hi,
Thanks for sending those. I ran a test on my system with your namelist.input file and your met_em* files and am able to run the real.exe program quickly without any problems. This seems to be a problem related to your system. Unfortunately you are going to need to discuss the issue with a systems administrator at your institution to see if they can help. If you are able to get it figured out, please update the post so that it may help others in the future. Thanks.
 
Top