Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

WRF debug mode: floating divided by zero, netcdf issue

Mengjuan Liu

New member
I'm running WRF debug mode (by configure -D) on V4.4. The normal mode without -D runs smoothly, but the debug mode crashes by executing ideal.exe or real.exe or wrf.exe (with correct input and bdy files) with the following error:
forrtl: error (73): floating divide by zero
Image PC Routine Line Source
wrf.exe 000000000D407D9D Unknown Unknown Unknown
wrf.exe 000000000D405C37 Unknown Unknown Unknown
wrf.exe 000000000D39BBE4 Unknown Unknown Unknown
wrf.exe 000000000D39B9F6 Unknown Unknown Unknown
wrf.exe 000000000D324826 Unknown Unknown Unknown
wrf.exe 000000000D32BA26 Unknown Unknown Unknown
Unknown 00002AC5463F8630 Unknown Unknown Unknown
Unknown 00002AC5424392A9 Unknown Unknown Unknown
libnetcdf.so.11 00002AC542437B29 Unknown Unknown Unknown
libnetcdf.so.11 00002AC54242BA3D Unknown Unknown Unknown
libnetcdf.so.11 00002AC54242B983 Unknown Unknown Unknown
libnetcdf.so.11 00002AC54238C180 Unknown Unknown Unknown
libnetcdff.so.6 00002AC541E7F038 Unknown Unknown Unknown
wrf.exe 000000000D29D5BB ext_ncd_open_for_ 1986 wrf_io.f
wrf.exe 000000000427F2FA module_io_mp_wrf_ 18286 module_io.f90
wrf.exe 0000000003967364 module_io_domain_ 58 module_io_domain.f90
wrf.exe 0000000003E2A347 open_hist_w_ 2105 mediation_integrate.f90
wrf.exe 0000000003E1AAC4 med_hist_out_ 900 mediation_integrate.f90
wrf.exe 0000000003DEC0F1 med_before_solve_ 63 mediation_integrate.f90
wrf.exe 000000000055F62F module_integrate_ 325 module_integrate.f90
wrf.exe 0000000000411000 module_wrf_top_mp 338 module_wrf_top.f90
wrf.exe 0000000000410215 MAIN__ 30 wrf.f90
wrf.exe 00000000004100DE Unknown Unknown Unknown
libc.so.6 00002AC54682B555 Unknown Unknown Unknown
wrf.exe 000000000040FFE9 Unknown Unknown Unknown


With the help of Allinea DDT, I found the core error happens when calling netcdf functions in the subroutines in external/io_netcdf/wrf_io.F90:

Line 1443 SUBROUTINE ext_ncd_open_for_write_commit(DataHandle, Status)
...
stat = NF_ENDDEF(DH%NCID)
call netcdf_err(stat,Status)
...

and

Line 1490 subroutine ext_ncd_ioclose(DataHandle, Status)
...
stat = NF_CLOSE(DH%NCID)
call netcdf_err(stat,Status)

After commented these two subroutines I can run the executable files wrf.exe, ideal.exe without errors, but they no longer write any data out (in wrfinput_d01 or wrfout_...).
On the other hand, I cannot modify the nc functions as they are in the built-in libs.

The running environment:

intel/2016.4.072
hdf5/1.8.18-intel-s
jasper/2.0.14/
netcdf/4.4.1.1-intel-s. (/global/software/sl-7.x86_64/modules/intel/2016.4.072/netcdf/4.4.1.1-intel-s)
openmpi/3.0.1-intel
lapack/3.8.0
cmake/3.7.2

I plan to use debug mode / debugger to debug my parameterization schemes, yet at the moment the clean version does not work with debug mode, What should I do now?
Thanks!
 
Hi,
Before we dig into this, can you first try this with the latest version of WRF (V4.5.2) to determine if it's potentially a code issue that may have been resolved since v4.4? Thanks!
 
Hi,
Before we dig into this, can you first try this with the latest version of WRF (V4.5.2) to determine if it's potentially a code issue that may have been resolved since v4.4? Thanks!
Thanks for the reply!
I tried this with WRFv4.5.2 using DDT and it reported same error due to same netcdf function as follows
1708565787670.pngScreen Shot 2024-02-21 at 5.37.41 PM.png
 
Thank you for trying that. Can you please package up all of your rsl* files into a single *.tar file and attach that, along with the namelist.input file you used for this case? Thanks!
 
The rsl and namelist files of two cases:
a real case run in WRF4.4, from the online tutorial case (hurricane Matthew), with some additional output containing "Julia" in rsl.out.0000 for personal debug use;
an ideal case run in WRF4.5.2, with both namelist.input and input_sounding from SGP case
both run in serial, are compressed into the attached file, thanks!
 

Attachments

  • WRFdebugmode.tar
    330 KB · Views: 1
Thanks for sending that information. I just tried this out with a generic case I have, and then again with the online tutorial (hurricane Matthew) single domain case. I compiled V4.4 with configure -D and both cases run to completion without any issues. I wonder if this is possibly specific to your environment or compile. To narrow this down, let's just focus on the V4.4 real-data case. Can you share the following?
1) configure.wrf and compile log
2) your met_em* files for the case so I can try to run with your specific input - these will likely be too large to attach, so take a look at the home page of this forum for instructions on sharing large files.

I assume you didn't make any modifications to the code, corrrect?
 
Thanks for looking into the bug! I agree it is largely due to my environment, as DDT leads me to the nc functions from netcdf-fortran library (/global/software/sl-7.x86_64/modules/intel/2016.4.072/netcdf/4.4.1.1-intel-s)
The configure.wrf and compile log are attached, and the met_em files are uploaded to nextcloud, named as met_em(3).tar. I'm sure I didn't make any modification to the code.
 

Attachments

  • compile_log.tar
    1 MB · Views: 1
Thank you for sharing that, and apologies on the delay in response. I tried a simulation with the files you shared. I only ran a 12 hour simulation, but it ran okay without issues. Since yours stopped immediately, it doesn't seem to be stopping like yours stopped. One thing I notice is that you're using a pretty old version of Intel (V16), and I'm using V23. If you're able to update to a newer version of Intel, that may help you.
 
Thanks for trying that out! After the system updates the intel compiler to 2022 with corresponding netcdf and hdf5, the errors do disappear!
 
Top