Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Error when Ungrib ERA5 in Cori NERSC, but works in Cheyenne

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

htan2013

Member
Hello all,

I was trying to Ungrib the ERA5 pl data in Cori, NERSC. However, it always ends up with this error:

-------------------------------------------------------------------------------
Name of source model =>ECMWF
slurmstepd: error: *** STEP 46256240.0 ON nid02872 CANCELLED AT 2021-08-30T13:19:13 ***
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
ungrib.exe 0000000020083CD4 for__signal_handl Unknown Unknown
libpthread-2.26.s 00002AAAACD6F2D0 Unknown Unknown Unknown
ungrib.exe 000000002000F17B Unknown Unknown Unknown
ungrib.exe 00000000200294EB Unknown Unknown Unknown
ungrib.exe 00000000200219D0 Unknown Unknown Unknown
ungrib.exe 0000000020012251 Unknown Unknown Unknown
ungrib.exe 000000002000AF92 Unknown Unknown Unknown
libc-2.26.so 00002AAAAD2D734A __libc_start_main Unknown Unknown
ungrib.exe 000000002000AEAA Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
ungrib.exe 0000000020083CD4 for__signal_handl Unknown Unknown
libpthread-2.26.s 00002AAAACD6F2D0 Unknown Unknown Unknown
ungrib.exe 000000002000F127 Unknown Unknown Unknown
ungrib.exe 0000000020029475 Unknown Unknown Unknown
ungrib.exe 00000000200219D0 Unknown Unknown Unknown
ungrib.exe 0000000020012251 Unknown Unknown Unknown
ungrib.exe 000000002000AF92 Unknown Unknown Unknown
libc-2.26.so 00002AAAAD2D734A __libc_start_main Unknown Unknown
ungrib.exe 000000002000AEAA Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)


My namelist:

&share
wrf_core = 'ARW',
max_dom = 3,
start_date = '2009-04-01_00:00:00','2009-04-01_00:00:00','2009-04-01_00:00:00'
end_date = '2009-04-07_00:00:00','2009-04-07_00:00:00','2009-04-07_00:00:00'
interval_seconds = 3600
io_form_geogrid = 2,
/

&geogrid
parent_id = 1, 1, 2,
parent_grid_ratio = 1, 3, 3,
i_parent_start = 1, 201, 78,
j_parent_start = 1, 79, 53,
e_we = 677, 328, 580,
e_sn = 571, 370, 700,
!
!!!!!!!!!!!!!!!!!!!!!!!!!!!! IMPORTANT NOTE !!!!!!!!!!!!!!!!!!!!!!!!!!!!
! The default datasets used to produce the MAXSNOALB and ALBEDO12M
! fields have changed in WPS v4.0. These fields are now interpolated
! from MODIS-based datasets.
!
! To match the output given by the default namelist.wps in WPS v3.9.1,
! the following setting for geog_data_res may be used:
!
! geog_data_res = 'maxsnowalb_ncep+albedo_ncep+default', 'maxsnowalb_ncep+albedo_ncep+default',
!
!!!!!!!!!!!!!!!!!!!!!!!!!!!! IMPORTANT NOTE !!!!!!!!!!!!!!!!!!!!!!!!!!!!
!
geog_data_res = 'nlcd2011_9s+30s','nlcd2011_9s+30s','nlcd2011_9s+30s'
dx = 2250,
dy = 2250,
map_proj = 'mercator',
ref_lat = 45.382,
ref_lon = -85.609,
truelat1 = 45.382,
truelat2 = 0.0,
stand_lon = -85.609,
geog_data_path = '/global/cscratch1/sd/htan2013/WPS_GEOG'
/

&ungrib
out_format = 'WPS',
prefix = '3D',
/

&metgrid
fg_name = '3D','SFC'
io_form_metgrid = 2,
/

The machine is Cori in NERSC. I have tested exactly the same data with the same script (same nodes and tasks) in Cheyenne and it worked.
This error is shown when all 3D:2009 files are created. The log file shows that this error comes out when the Ungrib is deleting PFILE.

I have attached the log file if anyone can help me.

Much appreciated,
Haochen
 

Attachments

  • slurm-46256240.txt
    58.3 MB · Views: 31
Haochen,
Unfortunately this is a machine issue. We have no access to NERSC and cannot repeat the error in NCAR cheyenne. Would you please talk to your computer manager and seek some advise from them ? Please keep us updated if you figure out the problem. Thanks in advance.
 
Top