Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

(RESOLVED) "sys-2 : UNRECOVERABLE error on system request" error on Cray Cluster

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

Chapacha

New member
I am testing with running WPS 3.9.1 on a Cray XC40 cluster. The driving data are the 0.5 degree GFS data. Compilation of WPS 3.9.1 is successful but every time I ran ungrib.exe I got the following error messages and I do not know how to fix the problems. Could you please take a look and let me know if you have any suggestions of how to fix the problems?

Thanks!
Yongxin

......
Name of source model =>NCEP GFS Model GRID 4

sys-2 : UNRECOVERABLE error on system request
No such file or directory

Encountered during an OPEN of unit 13
Fortran unit 13 is not connected
Name of source model =>NCEP GFS Model GRID 4
Name of source model =>NCEP GFS Model GRID 4
Name of source model =>NCEP GFS Model GRID 4

sys-2 : UNRECOVERABLE error on system request
No such file or directory

Encountered during an OPEN of unit 13
Fortran unit 13 is not connected
Name of source model =>NCEP GFS Model GRID 4
srun: error: nid00343: task 15: Aborted
srun: Terminating job step 8671563.0
slurmstepd: error: *** STEP 8671563.0 ON nid00343 CANCELLED AT 2019-03-14T05:18:10 ***
srun: error: nid00343: tasks 0-11,13-14,16-31: Terminated
srun: error: nid00343: task 12: Aborted (core dumped)
srun: Force Terminated job step 8671563.0
 
Yongxin,

Without access to Cray, it is hard for me to tell exactly what is wrong. The error message shows "no such files or directory", can you figure out which file for directory the code is trying to access? That might be a starting point to figure out what is wrong.
 
Thank you both so much for looking into this problem. I was indeed running ungrib.exe in parallel by setting ntasks=32 and after I changed that to ntasks=1 my ungrib.exe ran through without any problems. Also, when I was running ungrib.exe in parallel, my metgrid.exe was deleted from the run directory every time I ran ungrib.exe but not time time after I set ntasks=1. Thank you so much for fixing this issue!

Yongxin
 
Top