Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Issue in mpas_atmphys_o3climatology.F

JasonLO

New member
Hi all,

I tried to optimize some of loops to speed MPAS simulation, but I have encountered an unknown issue and hope someone can answer me.
Here is the loop I'm going to optimize by change the loop order from
Code:
    do m = 1,num_months
    do k = 1,levsiz
    do j = 1,lonsiz
          [ code ]
    end do
    end do
    end do
to
Code:
    do j = 1,lonsiz
    do k = 1,levsiz
    do m = 1,num_months
          [ code ]
    end do
    end do
    end do

By checking the variable I can confirm changing the loop order doesn't affect the data copy result since each element is independent to each other. But after I ran the simulation the values were changed...that is not sense...

Jason[/URL]
 
Last edited:
How different are the results? What precision are you running at, single or double? Small changes to oxmix could be reasonable in this case.

From the code you shared, there's a "reduction" - a summing of values from oxmixin(j,k,i,m) over the j-dimension into oxmix(m,k,i). Re-ordering these loops could cause differences due to the details of floating-point math on computers and rounding that occurs. The Addition section of the Wikipedia article on Round-off error gives a good and short explanation of this. Changing the precision the code runs at could affect how much rounding occurs which could reduce, change, or eliminate the differences you saw.
 
Hi gdicker:

I compiled MPAS with single precision and just checked global max/min values in log file since I hope the optimization won't affect the results. I noticed that this small change will affect max/min wind speed and vertical velocity.
Although the "reduction" maybe happen, but in this case lonsiz is set to 1 so actually j is only 1(see these lines). Therefore reduction doesn't take effect I think.

In fact, not only mpas_atmphys_o3climatology.F but also it happened in mpas_atmphys_driver_seaice.F. For example, change the following code from
Code:
do i = its,ite
..... (other code)
!--- inout variables:
do n = 1,num_soils
    tslb_p(i,n,j) = tslb(n,i)
enddo
.....(other code)
enddo
to
Code:
!--- inout variables:
do n = 1,num_soils
do i = its,ite
    tslb_p(i,n,j) = tslb(n,i)
enddo
enddo

do i = its,ite
    (other code)
end do
also affect the result.
 
Last edited:
Ahh, thanks for pointing that out. It still could be that doing the floating point operations in a different order causes the rounding to occur in different orders.

This could also be due to how the compiler optimizes the code. Re-ordering the loops could cause different assembly code to be generated. It may be worth checking with different optimization flags when compiling - I'd suspect that runs with -O0 should match between loop orders, but that is dependent on the compiler implementation. There are other flags that may help too; research the compiler flags.

These same two points (rounding and optimization levels) likely apply to what you saw in mpas_atmphys_driver_seaice.F as well. Maybe more so towards compiler optimization affecting results with all the assignments/operations in the loop L212 to L270.

Answer changes don't seem to make sense in these cases, but are something to expect and quantify when editing code to optimize. It is a trade-off that even compilers make with their optimization levels. Sometimes you may not be able to apply an optimization because the effects of the differences are unacceptable.
 
Hi gdicker

Thank you so much for your answer. I'll try to change the optimization level to see the result again.
 
BTW, I did see some performance improvement by reordering the 2d array. Since MPAS has lots dimension transform in order to use WRF physics but some of the loop are not optimal. If changing the loop order to the column first order, MPAS can run a little bit faster.

The figure is a quick result of improving data locality, though I just run the model one time only.
 

Attachments

  • Profiling Results: Elapsed Time Before and After Optimization (Excluding Stream Output, Time I...png
    Profiling Results: Elapsed Time Before and After Optimization (Excluding Stream Output, Time I...png
    245.1 KB · Views: 2
Top