bennett_wineholt
New member
Hello,
I've run WRF on private and public cloud platforms before and I'm looking for updated best practices for good simulation performance.
Specifically, I'm planning to run a CONUS 2.5km benchmark (likely maria 1km as well) and then a Land Use Land Cover science case at around 1km grid resolution on the best performing platform.
For AWS a recent (Oct 2022) benchmark there is aws-hpc-tutorials/content/03-WRF at weather · aws-samples/aws-hpc-tutorials running WRF v4.3.3 using dashboard provisioning, Parallel Cluster guaranteed same rack placement group, Lustre parallel file system, EFA fast network 100Gbps, slurm scheduler, 96 CPU 384GB memory dual AMD EPYC 7003 instances, Intel compiler via spack.
For GCP a recent (Feb 2022) benchmark there is rcc-apps/wrf at main · FluidNumerics/rcc-apps running WRF v4.2 using terraform provisioning or cloud marketplace click to deploy, autoscaling cluster, Lustre parallel filesystem, Tier_1 fast network 100Gbps, slurm scheduler, c2 instances with 60 vCPU 240GB memory Intel Cascade Lake, Intel compiler via spack.
For Azure a recent (Mar 2021) benchmark there is azurehpc/apps/wrf at 5da70fef67d8f9ecdc730d2a30b1ac110dbefb63 · Azure/azurehpc running WRF v4.1.3 using azhpc-build CLI provisioning, virtual machine scale set cluster, Lustre file system, RDMA 200Gbps network, PBS scheduler, HBv2 120vCPU 456GB memory AMD EPYC 7V12 instances (though newer HBv4 exists) , Intel compiler with spack.
For Jetstream2 (Openstack cloud platform) a recent (Sept 2022 single node with this code) platform there is science-gateway/vms/wrf at master · Unidata/science-gateway running WRF v4.3 using docker packaging after Openstack CLI provisioning, needs clustering, local SSD and IDD data transfer, 100Gbps network may need security group setting for mpi ports, needs a scheduler, up to 128 vCPU 500GB memory dual AMD Milan 7713 instances, defaults to GNU compiler.
Does anyone have recommendations for
- improving performance on any or each of these particular platforms
- tips for scripting up controls or containerization for platform portability among these options
Thanks and regards,
Bennett
PS.
I am also interested in CPU performance tweaks anyone may be willing to share including choice of OS, page caching tuning, or KML kernel mode patches.
I've run WRF on private and public cloud platforms before and I'm looking for updated best practices for good simulation performance.
Specifically, I'm planning to run a CONUS 2.5km benchmark (likely maria 1km as well) and then a Land Use Land Cover science case at around 1km grid resolution on the best performing platform.
For AWS a recent (Oct 2022) benchmark there is aws-hpc-tutorials/content/03-WRF at weather · aws-samples/aws-hpc-tutorials running WRF v4.3.3 using dashboard provisioning, Parallel Cluster guaranteed same rack placement group, Lustre parallel file system, EFA fast network 100Gbps, slurm scheduler, 96 CPU 384GB memory dual AMD EPYC 7003 instances, Intel compiler via spack.
For GCP a recent (Feb 2022) benchmark there is rcc-apps/wrf at main · FluidNumerics/rcc-apps running WRF v4.2 using terraform provisioning or cloud marketplace click to deploy, autoscaling cluster, Lustre parallel filesystem, Tier_1 fast network 100Gbps, slurm scheduler, c2 instances with 60 vCPU 240GB memory Intel Cascade Lake, Intel compiler via spack.
For Azure a recent (Mar 2021) benchmark there is azurehpc/apps/wrf at 5da70fef67d8f9ecdc730d2a30b1ac110dbefb63 · Azure/azurehpc running WRF v4.1.3 using azhpc-build CLI provisioning, virtual machine scale set cluster, Lustre file system, RDMA 200Gbps network, PBS scheduler, HBv2 120vCPU 456GB memory AMD EPYC 7V12 instances (though newer HBv4 exists) , Intel compiler with spack.
For Jetstream2 (Openstack cloud platform) a recent (Sept 2022 single node with this code) platform there is science-gateway/vms/wrf at master · Unidata/science-gateway running WRF v4.3 using docker packaging after Openstack CLI provisioning, needs clustering, local SSD and IDD data transfer, 100Gbps network may need security group setting for mpi ports, needs a scheduler, up to 128 vCPU 500GB memory dual AMD Milan 7713 instances, defaults to GNU compiler.
Does anyone have recommendations for
- improving performance on any or each of these particular platforms
- tips for scripting up controls or containerization for platform portability among these options
Thanks and regards,
Bennett
PS.
I am also interested in CPU performance tweaks anyone may be willing to share including choice of OS, page caching tuning, or KML kernel mode patches.