Error when Ungrib ERA5 in Cori NERSC, but works in Cheyenne

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

htan2013

Member
Hello all,

I was trying to Ungrib the ERA5 pl data in Cori, NERSC. However, it always ends up with this error:

-------------------------------------------------------------------------------
Name of source model =>ECMWF
slurmstepd: error: *** STEP 46256240.0 ON nid02872 CANCELLED AT 2021-08-30T13:19:13 ***
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
ungrib.exe 0000000020083CD4 for__signal_handl Unknown Unknown
libpthread-2.26.s 00002AAAACD6F2D0 Unknown Unknown Unknown
ungrib.exe 000000002000F17B Unknown Unknown Unknown
ungrib.exe 00000000200294EB Unknown Unknown Unknown
ungrib.exe 00000000200219D0 Unknown Unknown Unknown
ungrib.exe 0000000020012251 Unknown Unknown Unknown
ungrib.exe 000000002000AF92 Unknown Unknown Unknown
libc-2.26.so 00002AAAAD2D734A __libc_start_main Unknown Unknown
ungrib.exe 000000002000AEAA Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
ungrib.exe 0000000020083CD4 for__signal_handl Unknown Unknown
libpthread-2.26.s 00002AAAACD6F2D0 Unknown Unknown Unknown
ungrib.exe 000000002000F127 Unknown Unknown Unknown
ungrib.exe 0000000020029475 Unknown Unknown Unknown
ungrib.exe 00000000200219D0 Unknown Unknown Unknown
ungrib.exe 0000000020012251 Unknown Unknown Unknown
ungrib.exe 000000002000AF92 Unknown Unknown Unknown
libc-2.26.so 00002AAAAD2D734A __libc_start_main Unknown Unknown
ungrib.exe 000000002000AEAA Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)


My namelist:

&share
wrf_core = 'ARW',
max_dom = 3,
start_date = '2009-04-01_00:00:00','2009-04-01_00:00:00','2009-04-01_00:00:00'
end_date = '2009-04-07_00:00:00','2009-04-07_00:00:00','2009-04-07_00:00:00'
interval_seconds = 3600
io_form_geogrid = 2,
/

&geogrid
parent_id = 1, 1, 2,
parent_grid_ratio = 1, 3, 3,
i_parent_start = 1, 201, 78,
j_parent_start = 1, 79, 53,
e_we = 677, 328, 580,
e_sn = 571, 370, 700,
!
!!!!!!!!!!!!!!!!!!!!!!!!!!!! IMPORTANT NOTE !!!!!!!!!!!!!!!!!!!!!!!!!!!!
! The default datasets used to produce the MAXSNOALB and ALBEDO12M
! fields have changed in WPS v4.0. These fields are now interpolated
! from MODIS-based datasets.
!
! To match the output given by the default namelist.wps in WPS v3.9.1,
! the following setting for geog_data_res may be used:
!
! geog_data_res = 'maxsnowalb_ncep+albedo_ncep+default', 'maxsnowalb_ncep+albedo_ncep+default',
!
!!!!!!!!!!!!!!!!!!!!!!!!!!!! IMPORTANT NOTE !!!!!!!!!!!!!!!!!!!!!!!!!!!!
!
geog_data_res = 'nlcd2011_9s+30s','nlcd2011_9s+30s','nlcd2011_9s+30s'
dx = 2250,
dy = 2250,
map_proj = 'mercator',
ref_lat = 45.382,
ref_lon = -85.609,
truelat1 = 45.382,
truelat2 = 0.0,
stand_lon = -85.609,
geog_data_path = '/global/cscratch1/sd/htan2013/WPS_GEOG'
/

&ungrib
out_format = 'WPS',
prefix = '3D',
/

&metgrid
fg_name = '3D','SFC'
io_form_metgrid = 2,
/

The machine is Cori in NERSC. I have tested exactly the same data with the same script (same nodes and tasks) in Cheyenne and it worked.
This error is shown when all 3D:2009 files are created. The log file shows that this error comes out when the Ungrib is deleting PFILE.

I have attached the log file if anyone can help me.

Much appreciated,
Haochen
 

Attachments

Haochen,
Unfortunately this is a machine issue. We have no access to NERSC and cannot repeat the error in NCAR cheyenne. Would you please talk to your computer manager and seek some advise from them ? Please keep us updated if you figure out the problem. Thanks in advance.
 
Back
Top