Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Why different compilers have different results for the same case?

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

Jing_iap

New member
Hi,
I'm trying to use WRF version 4.0.2 to conduct one high-resolution simulation (9km/3km/1km). However, I found that when I used different compilers (intel version 2017 and gcc version 7.3.1) with Intel CPU 6142, the results come to be different, the maximum difference of T2 even reached to 9 kelvin degree.

They shared the same wrfinput_d01, wrfbdy_d01, and namelist.inpuit. When I used these two different compilers to finish compilation, I didn't change anything in the configure.wrf except for the DM_FC & DM_CC part (mpif90 and mpicc for gcc, mpiifort and mpiicc for intel).

I also checked other variables, such as Q2, RAINNC, and U10, all of them show obvious differences just like T2.

Could you please tell me why different compilers have different results for the same case? Is it reasonable? How can I make them have the same output?

Thanks!
 
Hi,
As I'm not a computer science person, I can't tell you "why" this happens, except that they process differently. This is completely expected, though, and we are aware that results will never be identical when using different compilers or machines.
 
kwerner said:
Hi,
As I'm not a computer science person, I can't tell you "why" this happens, except that they process differently. This is completely expected, though, and we are aware that results will never be identical when using different compilers or machines.
Thanks for the reply.
I understand that differences exist when using different compilers or machines, but is it reasonable that differences are that large?
Do you have any idea which result is the "best" one? Do you have any recommendations about machine or compiler based on your experiences?
Thanks a lot!
 
Often, the differences arise from round-off error in floating-point operations, which may be handled differently by different compilers and machines. For example, compilers may use optimized math libraries or they may convert a division into a multiplication by the inverse, and different processors may provide different representations of floating-point numbers (e.g., extended precision) or use different rounding modes by default. Regardless of the source, the differences, which may initially be very small (around "machine epsilon") can grow quickly over time in chaotic systems, leading to qualitative differences in the model results. Judt (JAS, 2018) nicely illustrates this error growth.

Below is a simple Fortran program that illustrates the different results that can result from order of summation.

Code:
program assoc

    real :: x, y, z, w1, w2

    x = 1.0
    y = 2.0**(-24)
    z = 2.0**(-24)

    w1 = (x + y) + z
    w2 = x + (y + z)

    write(6,*) (w1 - 1.0), (w2 - 1.0)

    stop

end program assoc

In exact arithmetic, (x + y) + z is identical to x + (y + z), yet this is not in general true in floating-point arithmetic. Compounding matters, in numerical models like WRF that contain parameterizations of complex physical processes, there are often conditional statements that depend on floating-point values, and it's easy to imagine how these can amplify differences by causing entirely different code to be executed, depending on the compiler; for example (building on the Fortran, above):
Code:
   if (w1 > 1.0) then
      ... call some subroutine to handle the case where w1 > 1.0 ...
   else
      ... call a different subroutine to handle the case where w <= 1.0 ...
   end if
If instead, w1 was computed in the same way as w2, then an entirely different subroutine would be called.
 
Jing_iap said:
kwerner said:
Hi,
As I'm not a computer science person, I can't tell you "why" this happens, except that they process differently. This is completely expected, though, and we are aware that results will never be identical when using different compilers or machines.
Thanks for the reply.
I understand that differences exist when using different compilers or machines, but is it reasonable that differences are that large?
Do you have any idea which result is the "best" one? Do you have any recommendations about machine or compiler based on your experiences?
Thanks a lot!
I think my previous post may at least partially address the question of whether "large" differences in results are reasonable -- essentially, given a round-off-level difference and a long enough integration time, we can get qualitatively different "weather".

On the question of which compiler and machine is "best", I think that, absent any compiler bugs, there is no "best" choice. Some compilers are available for free, but they tend to produce slower executables than commercial compilers; so whether speed or price is more important is up to you. Some compilers also provide better compile-time diagnostics and better run-time checks, so if you intend to do any significant model development, there might be some reason in this regard to choose one compiler over another; however, we've often found that having multiple compilers can be helpful in development, as each compiler seems to have its own strengths when it comes to error checking.
 
Hi,
Should we also expect different results for different versions of the same compiler (e.g intel 18 and 19)?

Regards
 
Hi,

Yes, it is also possible for results to differ when the compiler version is different.
 
Top