Program stops in FTUV module with anthropogenic file after running a few hours

Post Reply
zxdawn
Posts: 30
Joined: Tue Dec 11, 2018 2:34 am

Program stops in FTUV module with anthropogenic file after running a few hours

Post by zxdawn » Mon May 18, 2020 1:40 pm

This issue is posted in the google group: https://groups.google.com/a/ucar.edu/fo ... Dsk5McDeqg.

In case anyone has some ideas, I copied them here.

I'm using ndown to run the WRF-Chem model and got the error for the domain 2.
I also tested without ndown, I got the same error.

Here's the end of rsl.error.0000:

Code: Select all

d01 2019-07-25_01:00:00            2 input_wrf: wrf_get_next_time current_date: 2019-07-25_01:00:00 Status =            0
**WARNING** Time in input file not being checked **WARNING**
input_wrf.F reading 4d real E_CO
 date 2019-07-25_01:00:00
 ds            1           1           1
 de          150           1         150
 ps            1           1           1
 pe           15           1           7
 ms           -4           1          -4
 me           22           1          14
d01 2019-07-25_01:00:00 module_io.F: in wrf_read_field

Although the job log reminded me that there's error in rsl.error.0344, I suppose this is caused by the part of reading anthropogenic file.
Because if I turn auxinput_5 off, the program will work well ...

Anyway, here's the end of rsl.error.0344:

Code: Select all

d01 2019-07-25_00:59:50 calling photolysis driver
 photolysis_driver: called for domain            1
d01 2019-07-25_00:59:50 calling ftuv_driver
d01 2019-07-25_00:59:50 no aerosols initialization yet
no aerosols initialization yet
forrtl: severe (174): SIGSEGV, segmentation fault occurred
I tried the same namelist and emission files with WRF-Chem V3.7.1 instead of WRF-Chem V4.1.4, it works well.

And, as mentioned before, if I turned off the input of anthropogenic emission in WRF-Chem V4.1.4, it also works well.

So, I suppose some differences lead to the failure which is related to reading the anthropogenic emissions.
Attachments
namelist.input
(7.17 KiB) Downloaded 9 times
Last edited by zxdawn on Tue May 19, 2020 1:11 pm, edited 1 time in total.

jordanschnell
Posts: 29
Joined: Thu Feb 27, 2020 10:58 pm

Re: Program stops at reading anthropogenic emission file after running a few hours

Post by jordanschnell » Mon May 18, 2020 5:41 pm

Hi,

Could you try running with larger value for debug_level to see exactly where the model is stopping. If possible, and you believe your problem lies with emissions, please add some write(*,*) statements in module_emissions_anthropogenics.F around these lines and recompiling -

!
! add emissions
!
do j=jts,jte
k_loop: DO k=kts,min(config_flags%kemit,kte)
conv_rho(its:ite) = 4.828e-4/rho_phy(its:ite,k,j)*dtstep/(dz8w(its:ite,k,j)*60.)
conv_rho_aer(its:ite) = alt(its:ite,k,j)*dtstep/dz8w(its:ite,k,j)
chem(its:ite,k,j,p_so2) = chem(its:ite,k,j,p_so2) + emis_ant(its:ite,k,j,p_e_so2)*conv_rho(its:ite)
chem(its:ite,k,j,p_co) = chem(its:ite,k,j,p_co) + emis_ant(its:ite,k,j,p_e_co)*conv_rho(its:ite)
chem(its:ite,k,j,p_no) = chem(its:ite,k,j,p_no) + emis_ant(its:ite,k,j,p_e_no)*conv_rho(its:ite)
chem(its:ite,k,j,p_no2) = chem(its:ite,k,j,p_no2) + emis_ant(its:ite,k,j,p_e_no2)*conv_rho(its:ite)
chem(its:ite,k,j,p_nh3) = chem(its:ite,k,j,p_nh3) + emis_ant(its:ite,k,j,p_e_nh3)*conv_rho(its:ite)
chem(its:ite,k,j,p_hcl) = chem(its:ite,k,j,p_hcl) + emis_ant(its:ite,k,j,p_e_hcl)*conv_rho(its:ite)
chem(its:ite,k,j,p_ch3cl) = chem(its:ite,k,j,p_ch3cl) + emis_ant(its:ite,k,j,p_e_ch3cl)*conv_rho(its:ite)
is_mozart:if( is_moz_chm ) then
chem(its:ite,k,j,p_bigalk) = chem(its:ite,k,j,p_bigalk) + emis_ant(its:ite,k,j,p_e_bigalk)*conv_rho(its:ite)
chem(its:ite,k,j,p_bigene) = chem(its:ite,k,j,p_bigene) + emis_ant(its:ite,k,j,p_e_bigene)*conv_rho(its:ite)


Jordan

zxdawn
Posts: 30
Joined: Tue Dec 11, 2018 2:34 am

Re: Program stops at reading anthropogenic emission file after running a few hours

Post by zxdawn » Tue May 19, 2020 8:08 am

Hi Jordan,

I increased the debug_level to 1000 and find there's something wrong in the FTUV module.
So, I added something to where the error happens:

Filename:

Code: Select all

./chem/module_ftuv_subs.F
Edit:

Code: Select all

!-----------------------------------------------------------------------------
! ... calculate sum of exponentials (eqs 7 and 8 of kockarts 1994)
!-----------------------------------------------------------------------------
      call wrf_message('Xin: begin calculate sum of exponentials')
      do iw = 1,ngast-1
         do k = 1,nz
            ki = index(k)
            call wrf_message('****************************')
            write(err_msg,*) 'Xin: ki ',ki
            call wrf_message(err_msg)
            write(err_msg, *) 'Xin: x_table ',x_table(ki,iw)
            call wrf_message(err_msg)
            write(err_msg, *) 'Xin: dels ',dels(k)
            call wrf_message(err_msg)
            write(err_msg, *) 'Xin: diff_x ',x_table(ki+1,iw) - x_table(ki,iw)
            call wrf_message(err_msg)
            rjm(k) = x_table(ki,iw) + dels(k)*(x_table(ki+1,iw) - x_table(ki,iw))
            write(err_msg, *) 'Xin: d_table ',d_table(ki,iw)
            call wrf_message(err_msg)
            write(err_msg, *) 'Xin: diff_d ',d_table(ki+1,iw) - d_table(ki,iw)
            call wrf_message(err_msg)
            rjo2(k) = d_table(ki,iw) + dels(k)*(d_table(ki+1,iw) - d_table(ki,iw))
            call wrf_message('****************************')
         end do
Here's the error in the job log:

Code: Select all

yhrun: error: cn11772: task 98: Exited with exit code 174
yhrun: First task exited 60s ago
yhrun: tasks 0-97,99-119: running
yhrun: task 98: exited abnormally
And this is the error mentioned in rsl.error.0098:

Code: Select all

Xin: begin calculate sum of exponentials
****************************
 Xin: ki    504216597
forrtl: severe (174): SIGSEGV, segmentation fault occurred
The strange thing is that the added message below the line, where the error happened, isn't related to some calculation ....
Why could the segmentation fault appear around that line?

As mentioned before, the same setting and data file didn't cause this error for WRF-Chem V3.7.1.

Regards,
Xin
Attachments
rsl.error.0098.tar.gz
Error information
(53.83 MiB) Downloaded 6 times

zxdawn
Posts: 30
Joined: Tue Dec 11, 2018 2:34 am

Re: Program stops at reading anthropogenic emission file after running a few hours

Post by zxdawn » Tue May 19, 2020 1:04 pm

Hi Jordan,

Sorry for the long error log. I set debug to 0 and output the necessary information to the new attached log file.

Actually, the issue is the initialization of the table interpolation in chem/module_ftuv_subs.F.
I attach github link of the corresponding lines here.

I added the output of ki and index(ki) like this:

Code: Select all

            index(ki) = 0
            dels(ki) = 0._dp
         end if
         write(err_msg,*) 'Xin: ki, index(ki) ', ki, index(ki)
         call wrf_message(err_msg)
      end do
As you can see in the error log, the first two values are very large:

Code: Select all

Xin: start initialize the table interpolation
 Xin: ki, index(ki)            1  1650901209
 Xin: ki, index(ki)            2  1070446359
 Xin: ki, index(ki)            3         274
 Xin: ki, index(ki)            4         274
 Xin: ki, index(ki)            5         273
This leads to the large value in the summation of exponentials:

Code: Select all

Xin: begin calculate sum of exponentials
****************************
 Xin: k, ki            1  1650901209

However, if I check the initialization of the table interpolation before this time, the ki and index(ki) are all same:

Code: Select all

Xin: start initialize the table interpolation
 Xin: ki, index(ki)            1         275
 Xin: ki, index(ki)            2         274
 Xin: ki, index(ki)            3         274
 Xin: ki, index(ki)            4         274
 Xin: ki, index(ki)            5         273
 .....
 
This is strange ... Is this caused by the computer or other things?

Regards,
Xin
Attachments
rsl.error_debug0.0098.tar.gz
(1.89 MiB) Downloaded 6 times

zxdawn
Posts: 30
Joined: Tue Dec 11, 2018 2:34 am

Re: Program stops in FTUV module with anthropogenic file after running a few hours

Post by zxdawn » Tue May 19, 2020 1:26 pm

I suppose the simplest fix is adding this line in the loop of initializing the table interpolation:

Code: Select all

index(ki) = min(index(ki),tdim-1)
But, I'm not sure where is the source of tdim ...
The definition of tdim in chem/module_ftuv_subs.F is:

Code: Select all

integer, private, parameter  :: tdim = 501
Although the maximum value of index(ki) is 275, I couldn't find the reassignment of tdim in the code ...
That should be the key of this issue.

Xin

zxdawn
Posts: 30
Joined: Tue Dec 11, 2018 2:34 am

Re: Program stops in FTUV module with anthropogenic file after running a few hours

Post by zxdawn » Tue May 19, 2020 2:40 pm

Update: I checked the tdim, they are all 501 in the loop.

jordanschnell
Posts: 29
Joined: Thu Feb 27, 2020 10:58 pm

Re: Program stops in FTUV module with anthropogenic file after running a few hours

Post by jordanschnell » Tue May 19, 2020 3:03 pm

Hi Xin,

I will look into the possible issue with this piece of code and get back with you - First, are you supplying the model with the exo_coldens file (https://www.acom.ucar.edu/wrf-chem/MOZC ... sGuide.pdf)?

Jordan

zxdawn
Posts: 30
Joined: Tue Dec 11, 2018 2:34 am

Re: Program stops in FTUV module with anthropogenic file after running a few hours

Post by zxdawn » Tue May 19, 2020 3:08 pm

Hi Jordan,

Thanks for your help. I have supplied these exo* files.

Regards,
Xin

jordanschnell
Posts: 29
Joined: Thu Feb 27, 2020 10:58 pm

Re: Program stops in FTUV module with anthropogenic file after running a few hours

Post by jordanschnell » Tue May 19, 2020 3:56 pm

Hi Xin,

Just wanted to maker sure. It seems the error is always occurring on ki = 1 or 2, correct?

My two thoughts are to add print statements for when ki <=2 to print out what lo2col(ki) is and what the o2_table values are when the index is chosen.

The other is that it is a memory issue, and either "index" or "tdim" is being overwritten somewhere, though you are correct that tdim is not assigned elsewhere.

I will keep looking.

Jordan

zxdawn
Posts: 30
Joined: Tue Dec 11, 2018 2:34 am

Re: Program stops in FTUV module with anthropogenic file after running a few hours

Post by zxdawn » Tue May 19, 2020 4:07 pm

Hi Jordan,

Yes, the error is always occurring on ki = 1 or 2.

I will add print statements for when ki <=2 to print out what lo2col(ki) is and what the o2_table values are tomorrow and let you know the results.

Regards,
Xin

zxdawn
Posts: 30
Joined: Tue Dec 11, 2018 2:34 am

Re: Program stops in FTUV module with anthropogenic file after running a few hours

Post by zxdawn » Wed May 20, 2020 2:02 am

Hi Jordan,

The values of o2col(ki) and lo2col(ki) are NaN when ki = 1 or 2 while o2_table(1) and o2_table(tdim) are all 22 and 27, respectively.
As a result, index(ki) isn't assigned ...

Here's the route of assignment of o2col:

Code: Select all

o2col(1:nz) = 0.2095_dp * scol(1:nz)
For scol, it's based on subroutine airmas.

Here's the edited chem/module_ftuv_subs.F:

Code: Select all

!-----------------------------------------------------------------------------
! ... initialize the table interpolation
!-----------------------------------------------------------------------------
      call wrf_message('Xin: start initialize the table interpolation')
      where( o2col(:) /= 0 )
         lo2col(:) = log10( o2col(:) )
      endwhere
      do ki = 1,nz
         if( o2col(ki) /= 0._dp ) then
            write(err_msg,*) 'Xin: ki, o2col(ki) ', ki, o2col(ki)
            call wrf_message(err_msg)
            write(err_msg,*) 'Xin:lo2col(ki) ', lo2col(ki)
            call wrf_message(err_msg)
            write(err_msg,*) 'Xin: o2_table(1), o2_table(tdim) ', o2_table(1), o2_table(tdim)
            call wrf_message(err_msg)
            if( lo2col(ki) <= o2_table(1) ) then
               write(err_msg,*) 'Xin: lo2col(ki) <= o2_table(1)'
               call wrf_message(err_msg)
               dels(ki) = 0._dp
               index(ki) = 1
            else if( lo2col(ki) >= o2_table(tdim) ) then
               write(err_msg,*) 'Xin: lo2col(ki) >= o2_table(tdim)'
               call wrf_message(err_msg)
               dels(ki) = 1._dp
               index(ki) = tdim-1
            else
               do k = 2,tdim
                  if( lo2col(ki) <= o2_table(k) ) then
                     write(err_msg,*) 'Xin: lo2col(ki) <= o2_table(k)'
                     call wrf_message(err_msg)
                     write(err_msg,*) 'Xin: k, t_fac ', k, t_fac
                     call wrf_message(err_msg)
                     write(err_msg,*) 'Xin: lo2col(ki), o2_table(k-1) ', lo2col(ki), o2_table(k-1)
                     call wrf_message(err_msg)
                     dels(ki) = t_fac*(lo2col(ki) - o2_table(k-1))
                     index(ki) = k-1
                     exit
                  end if
               end do
            end if
         else
            index(ki) = 0
            dels(ki) = 0._dp
         end if
         write(err_msg,*) 'Xin: ki, index(ki), tdim ', ki, index(ki), tdim
         call wrf_message(err_msg)
      end do
      call wrf_message('Xin: end initialize the table interpolation')
Regards,
Xin
Attachments
rsl.error.0098_o2col.tar.gz
(5.26 MiB) Downloaded 5 times

jordanschnell
Posts: 29
Joined: Thu Feb 27, 2020 10:58 pm

Re: Program stops in FTUV module with anthropogenic file after running a few hours

Post by jordanschnell » Wed May 20, 2020 3:26 pm

Hi XIn,

Thanks for doing that - I'm happy to continue down the rabbit hole on this subroutine - but what do you think could be causing the difference between the two WRF-Chem versions and anthro emissions on vs. not? It doesn't look like it has been touched between 3.7 and 4.1.

zxdawn
Posts: 30
Joined: Tue Dec 11, 2018 2:34 am

Re: Program stops in FTUV module with anthropogenic file after running a few hours

Post by zxdawn » Thu May 21, 2020 2:13 am

Hi Jordan,

It's difficult to compare two different versions of WRF-Chem.

Anyway, I tried to run V4.1.4 with the wrfinput and wrfbdy generated by V3.7.1 while emission files are kept as same as V4.1.4.
The settings I need to change in the namelist.input are:

Code: Select all

 &time_control
 force_use_old_data                  = .true.,

 &dynamics
 hybrid_opt                          = 0,
 use_theta_m                         = 0,
Then, it works well.
So, I guess maybe this is caused by the different coordinate?

I'll disable hybrid_opt and use_theta_m one by one to check which is the cause.

Regards,
Xin

zxdawn
Posts: 30
Joined: Tue Dec 11, 2018 2:34 am

Re: Program stops in FTUV module with anthropogenic file after running a few hours

Post by zxdawn » Thu May 21, 2020 7:59 am

Hi Jordan,

I tested three options for V4.1.4:

Option_1:

Code: Select all

 hybrid_opt                          = 0,
 use_theta_m                         = 0,
Option_2:

Code: Select all

 use_theta_m                         = 0,
Option_3:

Code: Select all

 hybrid_opt                          = 0,
Both Option_1 and Option_2 works well. So, that should be related to use_theta_m.

jordanschnell
Posts: 29
Joined: Thu Feb 27, 2020 10:58 pm

Re: Program stops in FTUV module with anthropogenic file after running a few hours

Post by jordanschnell » Thu May 21, 2020 3:01 pm

Hi Xin,

Yes, that seems to be the case - nice sleuthing on your part. I am not well-versed on the transition for the vertical coordinates, but I do know that it can cause issues - and it makes since that it would be occurring for levels at the boundary. To be sure, you could create the me_em files (and then wrfbdy, wrfinput) using only v4 WPS/WRF. Glad you were able to solve this!

Cheers,

Jordan

Post Reply

Return to “Running”