Inconsistent WRF results when using different number of cores

chenghao · Sep 7, 2024

I am running a large-domain WRF v4.5.2 simulation on Derecho with varying core counts. The simulation setup is identical across runs, except for the number of cores used. However, when comparing the output at the same timestamps, I noticed significant differences in the results dependent on the core count. For instance, after a five-day period, the 2-m air temperature between runs can differ by as much as ±10 K in some grid cells. After several tests, I found that this discrepancy is caused by the optimization settings during compilation. When I disable optimization during compilation, the results are identical regardless of the core count.

So why does the optimization during compilation result in different outcomes when varying the number of cores? Could this be related to the size of the decomposed tiles and their communication across cores?

To ensure reproducibility, I would either need to consistently use the same number of cores and nodes, or disable optimization during compilation. However, disabling optimization will certainly slow down the simulations. Even when using the same core count, I’m still uncertain about the reliability of the results, as a ±10 K variation in air temperature is quite concerning and could affect conclusions drawn from model evaluation against observational data.

I would appreciate any suggestions or insights. Thanks!

Ming Chen · Sep 9, 2024

We are well aware that different number of processors will lead to slightly different model results. This is caused by higher level of optimization.

To run WRF with low level of optimization could be expensive. As an alternative, we always suggest users to stay with the same number of processors.

Hope this is helpful for you.

chenghao · Sep 10, 2024

Thanks for your reply, Dr. Chen. I will use the same number of cores in the future.

I assume this "higher level of optimization" will only impact floating point operations and communication between cores, correct? But I am seeing ±10 K variation in air temperature and ±100-200 W/m2 variation in sensible/latent heat flux for some grid points. Is this expected when using different numbers of processors? Or does this indicate that the model already becomes unstable (with some certain numbers of cores)?

Inconsistent WRF results when using different number of cores

chenghao

New member

Ming Chen

Moderator

chenghao

New member