Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Gradient of resolved turbulent heat flux in WRF-LES 3.9.1.1 (du'T'/dx, dv'T'/dy dw'T'/dz)

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

winstonwu91

New member
Hi,

I modified module_avgflx_em.F to find resolved turbulent heat flux (u'T' v'T' w'T') and its horizontal and vertical gradient (du'T'/dx dv'T'/dy dw'T'/dz). I added the code as follows. The u_mass, v_mass, w_mass are velocity at mass points and I used grid%u_phy, grid%v_phy and grid%w_phy as input when upd_avgflx is called in solve_em.F.

While the resolved turbulent heat flux and the vertical gradient of vertical heat flux worked great, strange streaks are present at the horizontal gradients between the MPI-tiles. Then I tried
1) defining halo for the horizontal gradient of horizontal heat fluxes in registry.avgflx,
2) included the corresponding halo and called set_physical_bc3d for each of the horizontal gradients of horizontal heat fluxes in solve_em.F right after upd_avgflx.

The streaks are still present.


The modified code is as follows:

Code:
  subroutine upd_avgflx(avgflx_count,tol_steps,avgflx_rum,avgflx_rvm,avgflx_wwm, &
       &   ru_m, rv_m, ww_m, &
       & ids, ide, jds, jde, kds, kde,           &
       & ims, ime, jms, jme, kms, kme,           &
       & its, ite, jts, jte, kts, kte, do_cu,    &
       & cfu1,cfd1,dfu1,efu1,dfd1,efd1,          &
       & rdx, rdy, rdz,                   &
       & avgflx_cfu1,avgflx_cfd1,avgflx_dfu1,avgflx_efu1,avgflx_dfd1,avgflx_efd1, &
       & u_mass,v_mass,w_mass,t_mass, nba_mij, n_nba_mij, &
       & avgflx_u,avgflx_v,avgflx_w, avgflx_t, &
       & avgflx_uth_resolve, avgflx_vth_resolve, &
       & avgflx_wth_resolve, &
       & avgflx_dutdx, avgflx_dvtdy, avgflx_dwtdz)
       
       ...
       
       REAL, DIMENSION(ims:ime, kms:kme, jms:jme) , INTENT(INOUT) :: 
       & avgflx_u, avgflx_v, avgflx_w, avgflx_t, &
       & avgflx_uth_resolve, avgflx_vth_resolve, &
       & avgflx_wth_resolve, &
       & avgflx_dutdx, avgflx_dvtdy, avgflx_dwtdz
       
       ...
  
   ! Time-averaged wind speed, temperature and resolved turbulent heat flux at mass points
       DO j=jts,jte
         DO k=kts,kte
            DO i=its,ite
                uprime(i,k,j) = u_mass(i,k,j) - avgflx_u(i,k,j)
             	vprime(i,k,j) = v_mass(i,k,j) - avgflx_v(i,k,j)
             	wprime(i,k,j) = w_mass(i,k,j) - avgflx_w(i,k,j)
                tprime(i,k,j) = t_mass(i,k,j) - avgflx_t(i,k,j)
                avgflx_uth_resolve(i,k,j) = (local_count*avgflx_uth_resolve(i,k,j) + uprime(i,k,j) * tprime(i,k,j) )/(local_count+1.)
                avgflx_vth_resolve(i,k,j) = (local_count*avgflx_vth_resolve(i,k,j) + vprime(i,k,j) * tprime(i,k,j) )/(local_count+1.)
                avgflx_wth_resolve(i,k,j) = (local_count*avgflx_wth_resolve(i,k,j) + wprime(i,k,j) * tprime(i,k,j) )/(local_count+1.)
                avgflx_u(i,k,j) = (local_count*avgflx_u(i,k,j) + u_mass(i,k,j))/(local_count+1.)
                avgflx_v(i,k,j) = (local_count*avgflx_v(i,k,j) + v_mass(i,k,j))/(local_count+1.)
                avgflx_w(i,k,j) = (local_count*avgflx_w(i,k,j) + w_mass(i,k,j))/(local_count+1.)
                avgflx_t(i,k,j) = (local_count*avgflx_t(i,k,j) + t_mass(i,k,j))/(local_count+1.)
                ...
            END DO
          END DO
        END DO
   
   ! Gradient of turbulent heat flux     
        ktf=MIN(kte,kde-1)
    i_start = its
    i_end   = MIN(ite,ide-1)
    j_start = jts
    j_end   = MIN(jte,jde-1)
    
    DO j=j_start-1,j_end+1 ! I tried j_start,j_end; kts, ktf; i_start, i_end too, streaks are still there.
      DO k=kts-1,ktf+1
        DO i=i_start-1,i_end+1
            ! Central difference for heat flux gradient at mass points 
            avgflx_dwtdz(i,k,j) = 0.5 * (avgflx_wth_resolve(i,k+1,j) - avgflx_wth_resolve(i,k-1,j)) * rdz(i,k,j)
	   avgflx_dutdx(i,k,j) = 0.5 * (avgflx_uth_resolve(i+1,k,j) - avgflx_uth_resolve(i-1,k,j)) * rdx
	    avgflx_dvtdy(i,k,j) = 0.5 * (avgflx_vth_resolve(i,k,j+1) - avgflx_vth_resolve(i,k,j-1)) * rdy
	END DO
      END DO
    END DO

The added halo at the last of lines in registry.avgflx:

Code:
...
halo HALO_EM_AVGFLX 8:avgflx_dutdx,avgflx_dvtdy,avgflx_dwtdz
period PERIOD_EM_AVGFLX 3:avgflx_dutdx,avgflx_dvtdy,avgflx_dwtdz

And corresponding solve_em.F
Code:
...
!After calling upd_avgflx
#ifdef DM_PARALLEL
#      include "HALO_EM_AVGFLX.inc"
#      include "PERIOD_EM_AVGFLX.inc"
#endif

    !$OMP PARALLEL DO   &
    !$OMP PRIVATE ( ij )
    DO ij = 1 , grid%num_tiles !--------------------------------
             CALL set_physical_bc3d( grid%avgflx_dutdx, 't', config_flags,     &
                                  ids, ide, jds, jde, kds, kde,     &
                                  ims, ime, jms, jme, kms, kme,     &
                                  ips, ipe, jps, jpe, kps, kpe,     &
                                  grid%i_start(ij), grid%i_end(ij), &
                                  grid%j_start(ij), grid%j_end(ij), &
                                  k_start, k_end                    )
                                  
          CALL set_physical_bc3d( grid%avgflx_dvtdy, 't', config_flags,     &
                                  ids, ide, jds, jde, kds, kde,     &
                                  ims, ime, jms, jme, kms, kme,     &
                                  ips, ipe, jps, jpe, kps, kpe,     &
                                  grid%i_start(ij), grid%i_end(ij), &
                                  grid%j_start(ij), grid%j_end(ij), &
                                  k_start, k_end                    )
                                  
          CALL set_physical_bc3d( grid%avgflx_dwtdz, 't', config_flags,     &
                                  ids, ide, jds, jde, kds, kde,     &
                                  ims, ime, jms, jme, kms, kme,     &
                                  ips, ipe, jps, jpe, kps, kpe,     &
                                  grid%i_start(ij), grid%i_end(ij), &
                                  grid%j_start(ij), grid%j_end(ij), &
                                  k_start, k_end                    ) 
      ENDDO !-----------------------------------------------------
    !$OMP END PARALLEL DO
    ...

The streaks look like this. My Ntasks in X is 18 so the tiles are at the boundaries.

2021-01-19_10-47.png

I am out of ideas of how to remove the streaks between tiles. If possible is it a way to solve the horizontal gradient of horizontal fluxes?

Best,

Winston
 
Winston,

Yes, this looks exactly like an MPI communications issue. This could be a back and forth discussion. Please let me know if I am interpreting anything incorrectly.

I'll grab a few snippets of your code, just for the u values (for simplicity). If these are all mass points, the variables are really indistinguishable from a staggering perspective. For these "window pane" figures that identify the MPI decomposition, what I always look for is a computation that goes off of the (i,j) column, basically something that will demand a communication with neighboring MPI processes (whether north, south, east, or west neighbors).

This first section of your code (written AS-IS) requires no comms. This is almost similar to the way that we set up loop limits within a column-oriented physics routine. I am assuming that the following are ALL on mass points: uprime, u_mass, avgflx_u, avgflx_uth_resolve, tprime.
Code:
   ! Time-averaged wind speed, temperature and resolved turbulent heat flux at mass points
       DO j=jts,jte
         DO k=kts,kte
            DO i=its,ite
                uprime(i,k,j) = u_mass(i,k,j) - avgflx_u(i,k,j)
                avgflx_uth_resolve(i,k,j) = (local_count*avgflx_uth_resolve(i,k,j) + uprime(i,k,j) * tprime(i,k,j) )/(local_count+1.)
                avgflx_u(i,k,j) = (local_count*avgflx_u(i,k,j) + u_mass(i,k,j))/(local_count+1.)

This second part seems to be a good contender for the problematic piece since we have centered differences for avgflx_uth_resolve that use (i-1) and (i+1).
Code:
   ! Gradient of turbulent heat flux     
    ktf=MIN(kte,kde-1)
    i_start = its
    i_end   = MIN(ite,ide-1)
    j_start = jts
    j_end   = MIN(jte,jde-1)
    
    DO j=j_start-1,j_end+1 ! I tried j_start,j_end; kts, ktf; i_start, i_end too, streaks are still there.
      DO k=kts-1,ktf+1
        DO i=i_start-1,i_end+1
           ! Central difference for heat flux gradient at mass points 
	   avgflx_dutdx(i,k,j) = 0.5 * (avgflx_uth_resolve(i+1,k,j) - avgflx_uth_resolve(i-1,k,j)) * rdx

In the first loop, we COMPUTE (re-write, update) avgflx_uth_resolve from its through ite. This computation means that a value for this variable avgflx_uth_resolve is no longer valid in the halo region if we try to use this variable later (such as in the second loop). Another way to state this is that any appearance of a variable on the LHS of an equation invalidates the halo for subsequent use of that variable on the RHS of an equation.

In the second loop we use we USE values of avgflx_uth_resolve from its-2 through ite+2. The values that we are using for avgflx_uth_resolve in the second loop that are outside of the range ite through ite are from the halo region, and are necessarily no longer up-to-date.

I think that this is the trouble that you are seeing. If this is the case, here is a proposal:

For a strategy, may I suggest that you communicate all of the variables that will be used on the RHS inside of the first loop, and place that communication prior to the first loop. Since your second loop extends from its-2 through ite+2, use a 24 halo stencil. In the first loop, you will do redundant computations (two neighboring processors will independently compute values along the the boundary, and within their halo region). In the first loop, extend your indexing so that the assignment of variables is now from its-2 through ite+2 (of course, same with j). Then, even though you have invalidated the halo region by assigning to variables on the LHS (in the first loop), you are only using those values that you have locally assigned on the RHS (in the second loop).

The communications only work in routines that have the grid% values available, so you have those placed correctly.
 
Hi Dave,

Thanks for your reply. Correct me if I am wrong.

for the

Code:
    DO j=j_start-1,j_end+1 ! I tried j_start,j_end; kts, ktf; i_start, i_end too, streaks are still there.
      DO k=kts-1,ktf+1
        DO i=i_start-1,i_end+1

part in my second loop, I changed to j_start--j_end, kts--ktf, i_start--i_end, and now the second loop will use values from its-1--ite+1, jts-1--jte+1. And for the first loop I extended to its-1 to ite+1 (same for j). I also investigated du/dx (u from u_phy) and dT/dx (T from t_2).

After the modification, the tiles for du'T'/dx is still there. For du/dx and dT/dx, there are no MPI tiles in dT/dx (and if I didn't extend the first loop, there will be tiles in dT/dx!) but there are tiles in du/dx, so I think the MPI problem mostly lies in the handling of u_phy. I think the communication of t_2 between processes are done while u_phy is not. So I defined halo for u_phy, v_phy and w_phy (as you suggested for communicating the variables on the RHS for the first loop) and placed the communication before calling upd_avgflx in solve_em.F

in Registry.EM_COMMON (I am now using from its-1 to ite+1 now so I think 8 halo stencil will work)
Code:
halo      HALO_EM_UVMASS dyn_em    8:u_phy,v_phy,w_phy

and in solve_em.F
Code:
 #ifdef DM_PARALLEL
    #      include "HALO_EM_UVMASS.inc"
    #endif

! Update avgflx quantities
!tile-loop for upd_avgflx
   !$OMP PARALLEL DO   &
   !$OMP PRIVATE ( ij )

      DO ij = 1 , grid%num_tiles
         CALL upd_avgflx(grid%avgflx_count,0,grid%avgflx_rum,grid%avgflx_rvm,grid%avgflx_wwm, &
        ....

The tiles are still present even if I communicated u_phy, v_phy, and w_phy. Even if I communicated avgflx_uth_resolve before calling upd_avgflx, the tiles are still present.

I am now thinking that I may need to separate the two loops into two subroutines and call halo MPI communications for the variables for the gradient calculation between the first loop and the second loop.

Winston
 
Top