Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

speccing and pricing a new cluster

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

MikeZu

New member
Hello all...

In my job I run a number of WRF configurations for wind energy forecasting purposes. An example includes CONUS at 9km, with several 3km and 1km nests of fairly significant sizes for different regions. Since this is for forecasting, we distribute them over multiple nodes to get good performance.

Our current cluster (circa 2016) is adequate, but will be moving off of support next year, and our standard practice is to replace at that time. So we need to begin the speccing and budgeting process this year. I'm sure that hardware capabilities have advanced since 2016, and I'm hoping for advice on whether we should stick with a similar configuration, or try something new.

Our current cluster is based on the HP Apollo 6000 system. We have 32 nodes, each with dual Intel Xeon E5-2690 v4 @ 2.60GHz. 14 cores per CPU, so 28 cores per node. We've got an Infiniband interconnect - but I'm not 100% sure of the flavor. I think it's HDR. Plenty of memory per node (considerably more than is usually needed)

Our current "large" WRF v4.1 instance runs with decent performance using 400 cores (16 nodes, 25 cores per node) - though we wouldn't mind if it were faster. My initial thinking would be to go with a similar configuration, except with updated Xeon CPUs and Infiniband. The system has been extremely reliable, with very few problems, and minimal demands on IT support. We haven't always had the same experience. In particular, a trouble free HPC system in the corporate world is a big plus.

On the other hand, I've also been hearing of HPC caliber ethernet interconnects now, as well as AMD processors. I heard the same thing regarding Ethernet and HPC in 2016, but nearly everybody I spoke with who actually ran WRF still suggested sticking with Infiniband for both performance and price.

So, any advice? Stick with a similar configuration, or explore something different?

Thanks,
Mike
 
Hi Mike,
From the WRF team, we have this FAQ post that may be helpful:
https://forum.mmm.ucar.edu/phpBB3/viewtopic.php?f=71&t=70

But perhaps others may have suggestions for you, as well!
 
Thanks for the response, but unfortunately that FAQ isn't really helpful for us. We've been running WRF for over 10 years, on a variety of hardware, and are fairly knowledgeable about what the FAQ goes over. I'm primarily interested in new trends in hardware, and whether they'll work well for us.

Our current hardware is configured in a way that is quite similar to Cheyenne (much smaller), only we built our first, and our CPUs have fewer cores. My main question is should we just stick with that type of configuration, or explore other options (AMD vs Intel, Ethernet vs Infiniband, etc).

I'll probably check on more general purpose HPC forums. But thanks again for the response.
 
Top