Hello all...
In my job I run a number of WRF configurations for wind energy forecasting purposes. An example includes CONUS at 9km, with several 3km and 1km nests of fairly significant sizes for different regions. Since this is for forecasting, we distribute them over multiple nodes to get good performance.
Our current cluster (circa 2016) is adequate, but will be moving off of support next year, and our standard practice is to replace at that time. So we need to begin the speccing and budgeting process this year. I'm sure that hardware capabilities have advanced since 2016, and I'm hoping for advice on whether we should stick with a similar configuration, or try something new.
Our current cluster is based on the HP Apollo 6000 system. We have 32 nodes, each with dual Intel Xeon E5-2690 v4 @ 2.60GHz. 14 cores per CPU, so 28 cores per node. We've got an Infiniband interconnect - but I'm not 100% sure of the flavor. I think it's HDR. Plenty of memory per node (considerably more than is usually needed)
Our current "large" WRF v4.1 instance runs with decent performance using 400 cores (16 nodes, 25 cores per node) - though we wouldn't mind if it were faster. My initial thinking would be to go with a similar configuration, except with updated Xeon CPUs and Infiniband. The system has been extremely reliable, with very few problems, and minimal demands on IT support. We haven't always had the same experience. In particular, a trouble free HPC system in the corporate world is a big plus.
On the other hand, I've also been hearing of HPC caliber ethernet interconnects now, as well as AMD processors. I heard the same thing regarding Ethernet and HPC in 2016, but nearly everybody I spoke with who actually ran WRF still suggested sticking with Infiniband for both performance and price.
So, any advice? Stick with a similar configuration, or explore something different?
Thanks,
Mike
In my job I run a number of WRF configurations for wind energy forecasting purposes. An example includes CONUS at 9km, with several 3km and 1km nests of fairly significant sizes for different regions. Since this is for forecasting, we distribute them over multiple nodes to get good performance.
Our current cluster (circa 2016) is adequate, but will be moving off of support next year, and our standard practice is to replace at that time. So we need to begin the speccing and budgeting process this year. I'm sure that hardware capabilities have advanced since 2016, and I'm hoping for advice on whether we should stick with a similar configuration, or try something new.
Our current cluster is based on the HP Apollo 6000 system. We have 32 nodes, each with dual Intel Xeon E5-2690 v4 @ 2.60GHz. 14 cores per CPU, so 28 cores per node. We've got an Infiniband interconnect - but I'm not 100% sure of the flavor. I think it's HDR. Plenty of memory per node (considerably more than is usually needed)
Our current "large" WRF v4.1 instance runs with decent performance using 400 cores (16 nodes, 25 cores per node) - though we wouldn't mind if it were faster. My initial thinking would be to go with a similar configuration, except with updated Xeon CPUs and Infiniband. The system has been extremely reliable, with very few problems, and minimal demands on IT support. We haven't always had the same experience. In particular, a trouble free HPC system in the corporate world is a big plus.
On the other hand, I've also been hearing of HPC caliber ethernet interconnects now, as well as AMD processors. I heard the same thing regarding Ethernet and HPC in 2016, but nearly everybody I spoke with who actually ran WRF still suggested sticking with Infiniband for both performance and price.
So, any advice? Stick with a similar configuration, or explore something different?
Thanks,
Mike