CONUS benchmark problems

Topics specifically related to running the model in an HPC environment
Post Reply
steveb
Posts: 2
Joined: Thu Oct 17, 2019 9:14 am

CONUS benchmark problems

Post by steveb » Thu Oct 17, 2019 9:17 am

Hi all,

I'm trying to run the 12km CONUS benchmark.

I'm getting a segfault, but before that a warning that points are exceeding cfl=2. So it looks like it's just unstable, but I'd have assumed the benchmark was set up so that it would work?

This is using WRF 3.9.1.1. To run the benchmark I'm using the files from test/em_real, with data files from http://www2.mmm.ucar.edu/WG2bench/conus12km_data_v3/ over the top (e.g. replacing namelist.input). I had to modify the namelist to add

Code: Select all

use_baseparam_fr_nml = .t.
to the &dynamics section, but it is otherwise unchanged.

I'm then using:

Code: Select all

mpirun ./wrf.exe
This is under slurm so you can't see the process control but its on 8 nodes with 64 tasks per node = 512 cores/processes total. I did try smaller runs first with same result, but I can see there are benchmark results for 512 cores so seemed a reasonable place to start. Nodes have 64 cores with 128GB RAM.

Example of rsl.error.* file with segfault:

steveb
Posts: 2
Joined: Thu Oct 17, 2019 9:14 am

Re: CONUS benchmark problems

Post by steveb » Fri Oct 18, 2019 1:46 pm

Forum s/w complained the rsl.error. listing "Forbidden. Contains contacts. Message seems to be spam." so I can't post the full thing.

end of it was:

LBC for restart: Starting valid date = 2001-10-25_00:00:00, Ending valid date = 2001-10-25_06:00:00
LBC for restart: Found the correct bounding LBC time periods for restart time = 2001-10-25_00:00:00
Tile Strategy is not specified. Assuming 1D-Y
WRF TILE 1 IS 55 IE 81 JS 291 JE 300
WRF NUMBER OF TILES = 1
d01 2001-10-25_00:00:00 1997 points exceeded cfl=2 in domain d01 at time 2001-10-25_00:00:00 hours
d01 2001-10-25_00:00:00 MAX AT i,j,k: 67 295 34 vert_cfl,w,d(eta)= 2166.30884 -326.380646 1.25799999E-02

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

kwerner
Posts: 2287
Joined: Wed Feb 14, 2018 9:21 pm

Re: CONUS benchmark problems

Post by kwerner » Tue Nov 12, 2019 6:44 pm

Hi,
I first would like to apologize for the long delay in response to this inquiry. It seems as though it was overlooked. Thank you for your patience.
Over the years there have been code updates that may have led to these particular input files, and this particular namelist becoming unstable due to CFL errors. The benchmark files you are using are more than a decade old, and were created with a version of WRF much older than V3.9.1.1. I would just advise to try decreasing your time step to something more like 4xDX (or maybe even 3xDX) to see if that helps to overcome the CFL error.
NCAR/MMM

Post Reply

Return to “High-performance Computing”