(RESOLVED) IO quilt crash - cheyenne

Topics specifically related to running the model in an HPC environment
Post Reply
cjones
Posts: 23
Joined: Sat Jun 23, 2018 12:48 am

(RESOLVED) IO quilt crash - cheyenne

Post by cjones » Thu Dec 17, 2020 10:36 pm

hello,

I am running WRF on cheyenne with this configuration: 2 grids 8 km (536 x 481) and 1.6 km (796 x796), intel dm compilation, netcdf4, hourly outputs in the 1.6 km. the model runs fine using 60 nodes (2160 cpus) without quilting.

However, I am trying to see if the performance can be improved somewhat and used these settings:

nproc_x = 45,
nproc_y = 47,

&namelinst_quilt
nio_tasks_per_group = 9,
nio_groups = 5,

this combination totals 2160 cpus (45 x 47 + 9 x5). the model integrates about 12 hours and then exits with no apparent messages in the rsl.error logs
also the wrfout files have reasonable sizes but it is not viewable with ncview and the Times variable is missing.

any suggestions of what might be the problem is greatly appreciated.

cheers,
Charles.

davegill
Posts: 90
Joined: Mon Apr 23, 2018 9:03 pm

Re: IO quilt crash - cheyenne

Post by davegill » Fri Dec 18, 2020 9:41 pm

Charles,
Since this is on cheyenne, can you point to a directory that we can poke around in?
Dave Gill
NCAR/MMM

cjones
Posts: 23
Joined: Sat Jun 23, 2018 12:48 am

Re: IO quilt crash - cheyenne

Post by cjones » Fri Dec 18, 2020 10:37 pm

thanks Dave, it is in:

/glade/scratch/cjones/WRF-4.2.1/test/em_labfees

cjones
Posts: 23
Joined: Sat Jun 23, 2018 12:48 am

Re: IO quilt crash - cheyenne

Post by cjones » Wed Jan 27, 2021 5:48 am

hi
I figured out what the problem was: wrong configuration in the number of nodes/cpus in the submitting cheyenne script
I am not sure if I can resolve the ticket or not.
Charles

kwerner
Posts: 2287
Joined: Wed Feb 14, 2018 9:21 pm

Re: (RESOLVED) IO quilt crash - cheyenne

Post by kwerner » Wed Jan 27, 2021 7:44 pm

Charles,
Thanks for letting us know. I just added 'RESOLVED' to the title.
NCAR/MMM

Post Reply

Return to “High-performance Computing”