Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Running WRF on different machines

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

Xun_Wang

New member
Hi,

I am running WRFv4.1 in our HPC. Right now we have nodes with two different architectures (different CPUs and different mainboards).

I compiled WRFv4.1 on each architecture using the same version of libraries and Intel compiler (libraries and Intel compiler were compiled under each architecture respectively).

These two compilations of WRFv4.1 gave different results (same namelist, same forcing data, same processor number...). For U and V, the maximum difference can reach 10 m/s, and for T2 the maximum difference can reach 4 K.

Of course, some small differences in the output can be expected, but I am not sure if the difference should be this large.

I just want to ask if there is anything during the compilation process I need to take care of, so that the two compilations can produce the "same" result.

Best regards,
Xun
 
Hi Xun,

We are aware that in general it cannot be expected to produce identical results when running on two different computers, as there are too many variables in the process that can make a difference. A single digit change anywhere in the non-linear model code will produce different output, and will deviate more as time integrates. Unfortunately, even when you ensure that everything else is the same with the 2 compiles/runs, there isn't really anything that can be done to force bit-for-bit outcomes. We have had other users confirm pretty significant differences in the past, as well.
 
Xun,

No guarantees it will do anything, but you can try to remove -fp-model fast=1, -no-prec-div and -no-prec-sqrt from FCBASEOPTS_NO_G line (configure.wrf) and instead of those add -fp-model=precise or even -fp-model=strict. Keep in mind that this might come with some performance penalty though.

Please read here more in detail:
https://software.intel.com/en-us/fortran-compiler-developer-guide-and-reference-fp-model-fp

Ivan
 
kwerner said:
Hi Xun,

We are aware that in general it cannot be expected to produce identical results when running on two different computers, as there are too many variables in the process that can make a difference. A single digit change anywhere in the non-linear model code will produce different output, and will deviate more as time integrates. Unfortunately, even when you ensure that everything else is the same with the 2 compiles/runs, there isn't really anything that can be done to force bit-for-bit outcomes. We have had other users confirm pretty significant differences in the past, as well.

Hi,

thanks you for the reply and explanation!
 
meteoadriatic said:
Xun,

No guarantees it will do anything, but you can try to remove -fp-model fast=1, -no-prec-div and -no-prec-sqrt from FCBASEOPTS_NO_G line (configure.wrf) and instead of those add -fp-model=precise or even -fp-model=strict. Keep in mind that this might come with some performance penalty though.

Please read here more in detail:
https://software.intel.com/en-us/fortran-compiler-developer-guide-and-reference-fp-model-fp

Ivan

Hi Ivan,

thanks for your suggestion! I'll take a look at it.
 
Top