Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

WRF Cloud Deployment Best Practices

Hello,

I've run WRF on private and public cloud platforms before and I'm looking for updated best practices for good simulation performance.
Specifically, I'm planning to run a CONUS 2.5km benchmark (likely maria 1km as well) and then a Land Use Land Cover science case at around 1km grid resolution on the best performing platform.


For AWS a recent (Oct 2022) benchmark there is aws-hpc-tutorials/content/03-WRF at weather · aws-samples/aws-hpc-tutorials running WRF v4.3.3 using dashboard provisioning, Parallel Cluster guaranteed same rack placement group, Lustre parallel file system, EFA fast network 100Gbps, slurm scheduler, 96 CPU 384GB memory dual AMD EPYC 7003 instances, Intel compiler via spack.

For GCP a recent (Feb 2022) benchmark there is rcc-apps/wrf at main · FluidNumerics/rcc-apps running WRF v4.2 using terraform provisioning or cloud marketplace click to deploy, autoscaling cluster, Lustre parallel filesystem, Tier_1 fast network 100Gbps, slurm scheduler, c2 instances with 60 vCPU 240GB memory Intel Cascade Lake, Intel compiler via spack.

For Azure a recent (Mar 2021) benchmark there is azurehpc/apps/wrf at 5da70fef67d8f9ecdc730d2a30b1ac110dbefb63 · Azure/azurehpc running WRF v4.1.3 using azhpc-build CLI provisioning, virtual machine scale set cluster, Lustre file system, RDMA 200Gbps network, PBS scheduler, HBv2 120vCPU 456GB memory AMD EPYC 7V12 instances (though newer HBv4 exists) , Intel compiler with spack.

For Jetstream2 (Openstack cloud platform) a recent (Sept 2022 single node with this code) platform there is science-gateway/vms/wrf at master · Unidata/science-gateway running WRF v4.3 using docker packaging after Openstack CLI provisioning, needs clustering, local SSD and IDD data transfer, 100Gbps network may need security group setting for mpi ports, needs a scheduler, up to 128 vCPU 500GB memory dual AMD Milan 7713 instances, defaults to GNU compiler.


Does anyone have recommendations for
- improving performance on any or each of these particular platforms
- tips for scripting up controls or containerization for platform portability among these options

Thanks and regards,
Bennett

PS.
I am also interested in CPU performance tweaks anyone may be willing to share including choice of OS, page caching tuning, or KML kernel mode patches.
 
Hi Bennett,
if you are asking how to get better WRF simulation, the simulation is dependent on its physics and dynamics options. The same case should have the same results no matter where you run the model.
To get better simulations, there are a few general rules:
(1) the model domain should be large enough and the lateral boundary should not be located at complex terrain area
(2) high-quality forcing data should be used and the lateral condition should be updated as frequently as possible
(3) physics options should be selected based on your specific case, and the model performance is often case-dependent
More details can be found in the document: https://www2.mmm.ucar.edu/wrf/users/tutorial/presentation_pdfs/202001/chen_best_practices.pdf
Please also refer to the literature for more information.

Once you set up your case, I will be happy to look at your namelist.wps and namelist.input just to make sure all are fine.
 
Hi Ming,

Thank you for the reference, I have read through the material however simulation quality is not quite what I'm looking for.
For now I'm running the CONUS 2.5km benchmark verbatim. https://www2.mmm.ucar.edu/wrf/users/benchmark/benchdata_v422.html
What I want is advice for running this benchmark efficiently on public cloud resources.
A benchmark or namelist modification that runs with 1km or finer grid cell resolution (optionally being a Land Use Land Cover simulation) would help me to tune on any of these platforms.

Thanks,
Bennett
 
Top