Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Program stops in FTUV module with anthropogenic file after running a few hours

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

zxdawn

Member
This issue is posted in the google group: https://groups.google.com/a/ucar.edu/forum/#!topic/wrf-chem-run/pDsk5McDeqg.

In case anyone has some ideas, I copied them here.

I'm using ndown to run the WRF-Chem model and got the error for the domain 2.
I also tested without ndown, I got the same error.

Here's the end of rsl.error.0000:

Code:
d01 2019-07-25_01:00:00            2 input_wrf: wrf_get_next_time current_date: 2019-07-25_01:00:00 Status =            0
**WARNING** Time in input file not being checked **WARNING**
input_wrf.F reading 4d real E_CO
 date 2019-07-25_01:00:00
 ds            1           1           1
 de          150           1         150
 ps            1           1           1
 pe           15           1           7
 ms           -4           1          -4
 me           22           1          14
d01 2019-07-25_01:00:00 module_io.F: in wrf_read_field


Although the job log reminded me that there's error in rsl.error.0344, I suppose this is caused by the part of reading anthropogenic file.
Because if I turn auxinput_5 off, the program will work well ...

Anyway, here's the end of rsl.error.0344:

Code:
d01 2019-07-25_00:59:50 calling photolysis driver
 photolysis_driver: called for domain            1
d01 2019-07-25_00:59:50 calling ftuv_driver
d01 2019-07-25_00:59:50 no aerosols initialization yet
no aerosols initialization yet
forrtl: severe (174): SIGSEGV, segmentation fault occurred

I tried the same namelist and emission files with WRF-Chem V3.7.1 instead of WRF-Chem V4.1.4, it works well.

And, as mentioned before, if I turned off the input of anthropogenic emission in WRF-Chem V4.1.4, it also works well.

So, I suppose some differences lead to the failure which is related to reading the anthropogenic emissions.
 

Attachments

  • namelist.input
    7.2 KB · Views: 121
Hi,

Could you try running with larger value for debug_level to see exactly where the model is stopping. If possible, and you believe your problem lies with emissions, please add some write(*,*) statements in module_emissions_anthropogenics.F around these lines and recompiling -

!
! add emissions
!
do j=jts,jte
k_loop: DO k=kts,min(config_flags%kemit,kte)
conv_rho(its:ite) = 4.828e-4/rho_phy(its:ite,k,j)*dtstep/(dz8w(its:ite,k,j)*60.)
conv_rho_aer(its:ite) = alt(its:ite,k,j)*dtstep/dz8w(its:ite,k,j)
chem(its:ite,k,j,p_so2) = chem(its:ite,k,j,p_so2) + emis_ant(its:ite,k,j,p_e_so2)*conv_rho(its:ite)
chem(its:ite,k,j,p_co) = chem(its:ite,k,j,p_co) + emis_ant(its:ite,k,j,p_e_co)*conv_rho(its:ite)
chem(its:ite,k,j,p_no) = chem(its:ite,k,j,p_no) + emis_ant(its:ite,k,j,p_e_no)*conv_rho(its:ite)
chem(its:ite,k,j,p_no2) = chem(its:ite,k,j,p_no2) + emis_ant(its:ite,k,j,p_e_no2)*conv_rho(its:ite)
chem(its:ite,k,j,p_nh3) = chem(its:ite,k,j,p_nh3) + emis_ant(its:ite,k,j,p_e_nh3)*conv_rho(its:ite)
chem(its:ite,k,j,p_hcl) = chem(its:ite,k,j,p_hcl) + emis_ant(its:ite,k,j,p_e_hcl)*conv_rho(its:ite)
chem(its:ite,k,j,p_ch3cl) = chem(its:ite,k,j,p_ch3cl) + emis_ant(its:ite,k,j,p_e_ch3cl)*conv_rho(its:ite)
is_mozart:if( is_moz_chm ) then
chem(its:ite,k,j,p_bigalk) = chem(its:ite,k,j,p_bigalk) + emis_ant(its:ite,k,j,p_e_bigalk)*conv_rho(its:ite)
chem(its:ite,k,j,p_bigene) = chem(its:ite,k,j,p_bigene) + emis_ant(its:ite,k,j,p_e_bigene)*conv_rho(its:ite)


Jordan
 
Hi Jordan,

I increased the debug_level to 1000 and find there's something wrong in the FTUV module.
So, I added something to where the error happens:

Filename:
Code:
./chem/module_ftuv_subs.F

Edit:
Code:
!-----------------------------------------------------------------------------
! ... calculate sum of exponentials (eqs 7 and 8 of kockarts 1994)
!-----------------------------------------------------------------------------
      call wrf_message('Xin: begin calculate sum of exponentials')
      do iw = 1,ngast-1
         do k = 1,nz
            ki = index(k)
            call wrf_message('****************************')
            write(err_msg,*) 'Xin: ki ',ki
            call wrf_message(err_msg)
            write(err_msg, *) 'Xin: x_table ',x_table(ki,iw)
            call wrf_message(err_msg)
            write(err_msg, *) 'Xin: dels ',dels(k)
            call wrf_message(err_msg)
            write(err_msg, *) 'Xin: diff_x ',x_table(ki+1,iw) - x_table(ki,iw)
            call wrf_message(err_msg)
            rjm(k) = x_table(ki,iw) + dels(k)*(x_table(ki+1,iw) - x_table(ki,iw))
            write(err_msg, *) 'Xin: d_table ',d_table(ki,iw)
            call wrf_message(err_msg)
            write(err_msg, *) 'Xin: diff_d ',d_table(ki+1,iw) - d_table(ki,iw)
            call wrf_message(err_msg)
            rjo2(k) = d_table(ki,iw) + dels(k)*(d_table(ki+1,iw) - d_table(ki,iw))
            call wrf_message('****************************')
         end do

Here's the error in the job log:
Code:
yhrun: error: cn11772: task 98: Exited with exit code 174
yhrun: First task exited 60s ago
yhrun: tasks 0-97,99-119: running
yhrun: task 98: exited abnormally

And this is the error mentioned in rsl.error.0098:
Code:
Xin: begin calculate sum of exponentials
****************************
 Xin: ki    504216597
forrtl: severe (174): SIGSEGV, segmentation fault occurred

The strange thing is that the added message below the line, where the error happened, isn't related to some calculation ....
Why could the segmentation fault appear around that line?

As mentioned before, the same setting and data file didn't cause this error for WRF-Chem V3.7.1.

Regards,
Xin
 

Attachments

  • rsl.error.0098.tar.gz
    53.8 MB · Views: 71
Hi Jordan,

Sorry for the long error log. I set debug to 0 and output the necessary information to the new attached log file.

Actually, the issue is the initialization of the table interpolation in chem/module_ftuv_subs.F.
I attach github link of the corresponding lines here.

I added the output of ki and index(ki) like this:
Code:
            index(ki) = 0
            dels(ki) = 0._dp
         end if
         write(err_msg,*) 'Xin: ki, index(ki) ', ki, index(ki)
         call wrf_message(err_msg)
      end do

As you can see in the error log, the first two values are very large:
Code:
Xin: start initialize the table interpolation
 Xin: ki, index(ki)            1  1650901209
 Xin: ki, index(ki)            2  1070446359
 Xin: ki, index(ki)            3         274
 Xin: ki, index(ki)            4         274
 Xin: ki, index(ki)            5         273

This leads to the large value in the summation of exponentials:
Code:
Xin: begin calculate sum of exponentials
****************************
 Xin: k, ki            1  1650901209


However, if I check the initialization of the table interpolation before this time, the ki and index(ki) are all same:
Code:
Xin: start initialize the table interpolation
 Xin: ki, index(ki)            1         275
 Xin: ki, index(ki)            2         274
 Xin: ki, index(ki)            3         274
 Xin: ki, index(ki)            4         274
 Xin: ki, index(ki)            5         273
 .....

This is strange ... Is this caused by the computer or other things?

Regards,
Xin
 

Attachments

  • rsl.error_debug0.0098.tar.gz
    1.9 MB · Views: 63
I suppose the simplest fix is adding this line in the loop of initializing the table interpolation:
Code:
index(ki) = min(index(ki),tdim-1)

But, I'm not sure where is the source of tdim ...
The definition of tdim in chem/module_ftuv_subs.F is:
Code:
integer, private, parameter  :: tdim = 501

Although the maximum value of index(ki) is 275, I couldn't find the reassignment of tdim in the code ...
That should be the key of this issue.

Xin
 
Hi Xin,

I will look into the possible issue with this piece of code and get back with you - First, are you supplying the model with the exo_coldens file (https://www.acom.ucar.edu/wrf-chem/MOZCART_UsersGuide.pdf)?

Jordan
 
Hi Xin,

Just wanted to maker sure. It seems the error is always occurring on ki = 1 or 2, correct?

My two thoughts are to add print statements for when ki <=2 to print out what lo2col(ki) is and what the o2_table values are when the index is chosen.

The other is that it is a memory issue, and either "index" or "tdim" is being overwritten somewhere, though you are correct that tdim is not assigned elsewhere.

I will keep looking.

Jordan
 
Hi Jordan,

Yes, the error is always occurring on ki = 1 or 2.

I will add print statements for when ki <=2 to print out what lo2col(ki) is and what the o2_table values are tomorrow and let you know the results.

Regards,
Xin
 
Hi Jordan,

The values of o2col(ki) and lo2col(ki) are NaN when ki = 1 or 2 while o2_table(1) and o2_table(tdim) are all 22 and 27, respectively.
As a result, index(ki) isn't assigned ...

Here's the route of assignment of o2col:
Code:
o2col(1:nz) = 0.2095_dp * scol(1:nz)

For scol, it's based on subroutine airmas.

Here's the edited chem/module_ftuv_subs.F:
Code:
!-----------------------------------------------------------------------------
! ... initialize the table interpolation
!-----------------------------------------------------------------------------
      call wrf_message('Xin: start initialize the table interpolation')
      where( o2col(:) /= 0 )
         lo2col(:) = log10( o2col(:) )
      endwhere
      do ki = 1,nz
         if( o2col(ki) /= 0._dp ) then
            write(err_msg,*) 'Xin: ki, o2col(ki) ', ki, o2col(ki)
            call wrf_message(err_msg)
            write(err_msg,*) 'Xin:lo2col(ki) ', lo2col(ki)
            call wrf_message(err_msg)
            write(err_msg,*) 'Xin: o2_table(1), o2_table(tdim) ', o2_table(1), o2_table(tdim)
            call wrf_message(err_msg)
            if( lo2col(ki) <= o2_table(1) ) then
               write(err_msg,*) 'Xin: lo2col(ki) <= o2_table(1)'
               call wrf_message(err_msg)
               dels(ki) = 0._dp
               index(ki) = 1
            else if( lo2col(ki) >= o2_table(tdim) ) then
               write(err_msg,*) 'Xin: lo2col(ki) >= o2_table(tdim)'
               call wrf_message(err_msg)
               dels(ki) = 1._dp
               index(ki) = tdim-1
            else
               do k = 2,tdim
                  if( lo2col(ki) <= o2_table(k) ) then
                     write(err_msg,*) 'Xin: lo2col(ki) <= o2_table(k)'
                     call wrf_message(err_msg)
                     write(err_msg,*) 'Xin: k, t_fac ', k, t_fac
                     call wrf_message(err_msg)
                     write(err_msg,*) 'Xin: lo2col(ki), o2_table(k-1) ', lo2col(ki), o2_table(k-1)
                     call wrf_message(err_msg)
                     dels(ki) = t_fac*(lo2col(ki) - o2_table(k-1))
                     index(ki) = k-1
                     exit
                  end if
               end do
            end if
         else
            index(ki) = 0
            dels(ki) = 0._dp
         end if
         write(err_msg,*) 'Xin: ki, index(ki), tdim ', ki, index(ki), tdim
         call wrf_message(err_msg)
      end do
      call wrf_message('Xin: end initialize the table interpolation')

Regards,
Xin
 

Attachments

  • rsl.error.0098_o2col.tar.gz
    5.3 MB · Views: 60
Hi XIn,

Thanks for doing that - I'm happy to continue down the rabbit hole on this subroutine - but what do you think could be causing the difference between the two WRF-Chem versions and anthro emissions on vs. not? It doesn't look like it has been touched between 3.7 and 4.1.
 
Hi Jordan,

It's difficult to compare two different versions of WRF-Chem.

Anyway, I tried to run V4.1.4 with the wrfinput and wrfbdy generated by V3.7.1 while emission files are kept as same as V4.1.4.
The settings I need to change in the namelist.input are:

Code:
 &time_control
 force_use_old_data                  = .true.,

 &dynamics
 hybrid_opt                          = 0,
 use_theta_m                         = 0,

Then, it works well.
So, I guess maybe this is caused by the different coordinate?

I'll disable hybrid_opt and use_theta_m one by one to check which is the cause.

Regards,
Xin
 
Hi Jordan,

I tested three options for V4.1.4:

Option_1:
Code:
 hybrid_opt                          = 0,
 use_theta_m                         = 0,

Option_2:
Code:
 use_theta_m                         = 0,

Option_3:
Code:
 hybrid_opt                          = 0,

Both Option_1 and Option_2 works well. So, that should be related to use_theta_m.
 
Hi Xin,

Yes, that seems to be the case - nice sleuthing on your part. I am not well-versed on the transition for the vertical coordinates, but I do know that it can cause issues - and it makes since that it would be occurring for levels at the boundary. To be sure, you could create the me_em files (and then wrfbdy, wrfinput) using only v4 WPS/WRF. Glad you were able to solve this!

Cheers,

Jordan
 
Hi Jordan,

Although I have found the solution for this case, I met another issue which is similar to this one.

When I include the anthropogenic file, the WRF-Chem would hang at the begining of something about domain 3:

Code:
d02 2020-08-31_22:00:00 calling conv transport for chemical species
d02 2020-08-31_22:00:00 calling calc_het_n2o5
d02 2020-08-31_22:00:00 calling kpp_mechanism_driver
d02 2020-08-31_22:00:00 kpp_mechanism_driver: calling mozcart_interface
d02 2020-08-31_22:00:00 no gocart so2-so4 conversion
d02 2020-08-31_22:00:00 wetscav_driver calling wetscav_mozcart
d02 2020-08-31_22:00:00 sum_pm_driver: calling sum_pm_gocart
d02 2020-08-31_22:00:00 done tileloop in chem_driver
d03 2020-08-31_22:00:00 calling inc/HALO_EM_COUPLE_A_inline.inc

And it works well without auxinput5:

Code:
d02 2020-08-31_22:00:00 calling conv transport for chemical species
d02 2020-08-31_22:00:00 calling calc_het_n2o5
d02 2020-08-31_22:00:00 calling kpp_mechanism_driver
d02 2020-08-31_22:00:00 kpp_mechanism_driver: calling mozcart_interface
d02 2020-08-31_22:00:00 no gocart so2-so4 conversion
d02 2020-08-31_22:00:00 wetscav_driver calling wetscav_mozcart
d02 2020-08-31_22:00:00 sum_pm_driver: calling sum_pm_gocart
d02 2020-08-31_22:00:00 done tileloop in chem_driver
d03 2020-08-31_22:00:00 calling inc/HALO_EM_COUPLE_A_inline.inc
d03 2020-08-31_22:00:00 calling inc/PERIOD_EM_COUPLE_A_inline.inc
d03 2020-08-31_22:00:00 calling inc/HALO_EM_COUPLE_B_inline.inc
d03 2020-08-31_22:00:00 calling inc/PERIOD_EM_COUPLE_B_inline.inc
d03 2020-08-31_22:00:00 calling inc/HALO_EM_COUPLE_A_inline.inc
d03 2020-08-31_22:00:00 calling inc/PERIOD_EM_COUPLE_A_inline.inc
d03 2020-08-31_22:00:00 calling inc/HALO_EM_COUPLE_B_inline.inc
d03 2020-08-31_22:00:00 calling inc/PERIOD_EM_COUPLE_B_inline.inc
d03 2020-08-31_22:00:00 calling inc/HALO_FORCE_DOWN_inline.inc
d03 2020-08-31_22:00:00 calling inc/HALO_EM_COUPLE_A_inline.inc
d03 2020-08-31_22:00:00 calling inc/PERIOD_EM_COUPLE_A_inline.inc
d03 2020-08-31_22:00:00 calling inc/HALO_EM_COUPLE_B_inline.inc
d03 2020-08-31_22:00:00 calling inc/PERIOD_EM_COUPLE_B_inline.inc
d03 2020-08-31_22:00:00 calling inc/HALO_EM_COUPLE_A_inline.inc
d03 2020-08-31_22:00:00 calling inc/PERIOD_EM_COUPLE_A_inline.inc
d03 2020-08-31_22:00:00 calling inc/HALO_EM_COUPLE_B_inline.inc
d03 2020-08-31_22:00:00 calling inc/PERIOD_EM_COUPLE_B_inline.inc
 *************************************
 Nesting domain
 ids,ide,jds,jde            1         202           1         202
 ims,ime,jms,jme           -4          33          -4          27
 ips,ipe,jps,jpe            1          21           1          17
 INTERMEDIATE domain
 ids,ide,jds,jde           48         120          63         135
 ims,ime,jms,jme           43          66          58          80
 ips,ipe,jps,jpe           46          56          61          70
 *************************************


This issue happens for both WRFV4.1.4 and WRFV3.7.1.

Is there anything wrong with the namelist.input or the chemi file?

Regards,
Xin
 

Attachments

  • namelist.input
    7.9 KB · Views: 68
  • errors.zip
    20 KB · Views: 58
Update for the last problem:

Actually, this is caused by the wrong wrfbio* file.

As I ran the model from 2020-08-31 to 2020-09-01, I should set the duration of three months:
Code:
&control

domains = 4,
start_lai_mnth = 7,
end_lai_mnth   = 9,
wrf_dir   = './wrf/',
megan_dir = '../data/US/'
/

The strange thing is that the model only hangs when I add the anthropogenic input ...

Anyway, I posted this problem in Google Group(https://groups.google.com/a/ucar.edu/g/wrf-chem-bio_emiss/c/o0F-vj1QP-8/m/UkKZDwPcCQAJ) two years ago. Hope this post could let more people know the trick.
 
Hi Xin,

Are you saying the problem is fixed or it is still hanging with anthro input? I don't see anything wrong with your namelist. As for the wrfbiochemi file, In the past I've always just created one for all 12 months, but I don't know how that could be related to your anthro_emis files. Have you tried turning on emissions one domain at a time?

Jordan
 
Hi Jordan,

This problem is fixed by changing the start_lai_mnth and end_lai_mnth.

And Yes, I've tried to turn on emissions for one domain and that worked.
I guess the wrfbio* file for the domain 3 has some bugs because of the short time duration.

Here's the useful reply from Gabriele in the Google Group:
I just want to add some more information to the creation of the wrfbiochemi file. When running WRF-Chem with MEGAN online biogenic emissions you need the input fields in the wrfbiochemi file. These include time-independent isoprene emissions factors and climatological monthly weather parameters and LAI. The latter are needed because biogenic emissions not only depend on the current weather conditions but also on the weather and LAI of the previous month (this is discussed in the README file). Hence, if your simulation is for May and June, you have to create your wrfbiochemi file for Apr-June, i.e. set start_lai_mnth = 4 and end_lai_mnth = 6.
Gabriele

Xin
 
Top