Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

pnetCDF capability for WPS

segfault

New member
Hi all,
I have a very large domain with 5000x5000 grid points and want to create met_em files using ECMWF IFS input data with a large number of vertical levels making things even more difficult...

Creating met_em files works but takes a lot of I/O time with io_form_metgrid = 2 writing out (compressed) netCDF4 files. I can speed up the output using io_form_metgrid = 102. However, executing (a modified version of) the netCDF joiner to merge the small netCDFs written by each MPI rank also takes ages (>30min per timestep) but eventually creates the correct met_em files.

I want to speed up things and include the pnetCDF capability in WPS/metgrid (io_form_metgrid = 11). I already modifed the pnetCDF implementation in the WRF installation to write CDF5 files to circumvent format restrictions of 64byte-offset-format written by pnetCDF for such large domains. I implemented also all the necessary parts in WPS to compile with PNETCDF linked.

I modified WRFmetgrid/src/output_module.F at several parts including this one here:
Code:
#ifdef IO_BINARY
               if (io_form_output == BINARY) then
                  call ext_int_write_field(handle, datestr, trim(fields(i)%fieldname), &
                       real_dom_array, WRF_REAL, comm_1, comm_2, domain_desc, trim(fields(i)%mem_order), &
                       trim(fields(i)%stagger), fields(i)%dimnames, sd, ed, sm, em, sp, ep, istatus)
               end if
#endif
#ifdef IO_NETCDF
               if (io_form_output == NETCDF) then
                  call ext_ncd_write_field(handle, datestr, trim(fields(i)%fieldname), &
                       real_dom_array, WRF_REAL, comm_1, comm_2, domain_desc, trim(fields(i)%mem_order), &
                       trim(fields(i)%stagger), fields(i)%dimnames, sd, ed, sm, em, sp, ep, istatus)
               end if
#endif
#ifdef IO_PNETCDF
               if (io_form_output == PNETCDF) then
                  call ext_pnc_write_field(handle, datestr, trim(fields(i)%fieldname), &
                       real_dom_array, WRF_REAL, comm_1, comm_2, domain_desc, trim(fields(i)%mem_order), &
                       trim(fields(i)%stagger), fields(i)%dimnames, sd, ed, sm, em, sp, ep, istatus)
               end if
#endif
#ifdef IO_GRIB1
               if (io_form_output == GRIB1) then
                  call ext_gr1_write_field(handle, datestr, trim(fields(i)%fieldname), &
                       real_dom_array, WRF_REAL, comm_1, comm_2, domain_desc, trim(fields(i)%mem_order), &
                       trim(fields(i)%stagger), fields(i)%dimnames, sd, ed, sm, em, sp, ep, istatus)
               end if
#endif
               call mprintf((istatus /= 0),ERROR,'Error in ext_pkg_write_field')

resulted in
Code:
Abort(805912325) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Comm_rank: Invalid communicator, error stack:
PMPI_Comm_rank(106): MPI_Comm_rank(comm=0x1, rank=0x7fffc89053cc) failed
PMPI_Comm_rank(63).: Invalid communicator

For debugging I changed the communicators passed to ext_pnc_write_field to MPI_COMM_WORLD:
Code:
#ifdef IO_PNETCDF
            if (io_form_output == PNETCDF) then
               call ext_pnc_open_for_write(trim(output_fname), MPI_COMM_WORLD, MPI_COMM_WORLD, 'sysdep info', handle, istatus)
               ! call ext_pnc_open_for_write(trim(output_fname), comm_1, comm_2, 'sysdep info', handle, istatus)
            end if
#endif

Running metgrid in serial mode (./metgrid.exe) and with mpirun -np 1 ./metgrid.exe now works and files are created via pnetCDF in CDF5 format. All works correctly.

However, when using mpirun with > 1 execution stalls at the log message "--- Initializing output module."
So for me it seems that using MPI_COMM_WORLD is not appropriate here. From the WRF source code I found that grid%communicator is passed to ext_pnc_open_for_write.

For me it looks like this is an issue with the MPI communicator passed to the subroutine.... Any advice what communicator to use in WPS in the call for ext_pnc_open_for_write (and similar functions)? Any help is very appreciated!
I am willing to commit pnetCDF capability for WPS once fully working with > 1 MPI ranks.
 
Hi all,
I have a very large domain with 5000x5000 grid points and want to create met_em files using ECMWF IFS input data with a large number of vertical levels making things even more difficult...

Creating met_em files works but takes a lot of I/O time with io_form_metgrid = 2 writing out (compressed) netCDF4 files. I can speed up the output using io_form_metgrid = 102. However, executing (a modified version of) the netCDF joiner to merge the small netCDFs written by each MPI rank also takes ages (>30min per timestep) but eventually creates the correct met_em files.

I want to speed up things and include the pnetCDF capability in WPS/metgrid (io_form_metgrid = 11). I already modifed the pnetCDF implementation in the WRF installation to write CDF5 files to circumvent format restrictions of 64byte-offset-format written by pnetCDF for such large domains. I implemented also all the necessary parts in WPS to compile with PNETCDF linked.

I modified WRFmetgrid/src/output_module.F at several parts including this one here:
Code:
#ifdef IO_BINARY
               if (io_form_output == BINARY) then
                  call ext_int_write_field(handle, datestr, trim(fields(i)%fieldname), &
                       real_dom_array, WRF_REAL, comm_1, comm_2, domain_desc, trim(fields(i)%mem_order), &
                       trim(fields(i)%stagger), fields(i)%dimnames, sd, ed, sm, em, sp, ep, istatus)
               end if
#endif
#ifdef IO_NETCDF
               if (io_form_output == NETCDF) then
                  call ext_ncd_write_field(handle, datestr, trim(fields(i)%fieldname), &
                       real_dom_array, WRF_REAL, comm_1, comm_2, domain_desc, trim(fields(i)%mem_order), &
                       trim(fields(i)%stagger), fields(i)%dimnames, sd, ed, sm, em, sp, ep, istatus)
               end if
#endif
#ifdef IO_PNETCDF
               if (io_form_output == PNETCDF) then
                  call ext_pnc_write_field(handle, datestr, trim(fields(i)%fieldname), &
                       real_dom_array, WRF_REAL, comm_1, comm_2, domain_desc, trim(fields(i)%mem_order), &
                       trim(fields(i)%stagger), fields(i)%dimnames, sd, ed, sm, em, sp, ep, istatus)
               end if
#endif
#ifdef IO_GRIB1
               if (io_form_output == GRIB1) then
                  call ext_gr1_write_field(handle, datestr, trim(fields(i)%fieldname), &
                       real_dom_array, WRF_REAL, comm_1, comm_2, domain_desc, trim(fields(i)%mem_order), &
                       trim(fields(i)%stagger), fields(i)%dimnames, sd, ed, sm, em, sp, ep, istatus)
               end if
#endif
               call mprintf((istatus /= 0),ERROR,'Error in ext_pkg_write_field')

resulted in
Code:
Abort(805912325) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Comm_rank: Invalid communicator, error stack:
PMPI_Comm_rank(106): MPI_Comm_rank(comm=0x1, rank=0x7fffc89053cc) failed
PMPI_Comm_rank(63).: Invalid communicator

For debugging I changed the communicators passed to ext_pnc_write_field to MPI_COMM_WORLD:
Code:
#ifdef IO_PNETCDF
            if (io_form_output == PNETCDF) then
               call ext_pnc_open_for_write(trim(output_fname), MPI_COMM_WORLD, MPI_COMM_WORLD, 'sysdep info', handle, istatus)
               ! call ext_pnc_open_for_write(trim(output_fname), comm_1, comm_2, 'sysdep info', handle, istatus)
            end if
#endif

Running metgrid in serial mode (./metgrid.exe) and with mpirun -np 1 ./metgrid.exe now works and files are created via pnetCDF in CDF5 format. All works correctly.

However, when using mpirun with > 1 execution stalls at the log message "--- Initializing output module."
So for me it seems that using MPI_COMM_WORLD is not appropriate here. From the WRF source code I found that grid%communicator is passed to ext_pnc_open_for_write.

For me it looks like this is an issue with the MPI communicator passed to the subroutine.... Any advice what communicator to use in WPS in the call for ext_pnc_open_for_write (and similar functions)? Any help is very appreciated!
I am willing to commit pnetCDF capability for WPS once fully working with > 1 MPI ranks.
Interesting Idea.

Might be worth putting this also in the Issues section of the WPS github.

 
You could try to keep io_form = 102 for running real and wrf without merging the tiled output from metgrid. It's possible that you may have to use the same number of processors for all of these programs.
 
Hi,
I already tried that but there seems to be a bug when writing met_em-tiles. LEt me describe below and add a hotfix I just found.
I have in the namelist.wps
Code:
io_form_metgrid = 102
and after running on 256 MPI procs I get the correct file met_em.d0*_0000 to met_em.d0*_0255

I link them and run real.exe with
Code:
io_form_auxinput1 = 102
io_form_input = 11
also on 256 MPI proc. However, real.exe stops almost immediately after reading a few variables.
In rsl.error.0000 or rsl.out.0000 I cannot find an error but ALL other rsl-files contain:
Code:
taskid: 1 hostname: mmnode001
 module_io_quilt_old.F        2931 T
 Ntasks in X           16 , ntasks in Y           16
  Domain # 1: dx =  5000.000 m
  Domain # 2: dx =  1000.000 m
REAL_EM V4.3.1 PREPROCESSOR
No git found or not a git repository, git commit version not available.
 *************************************
 Parent domain
 ids,ide,jds,jde            1        1200           1        1075
 ims,ime,jms,jme           69         157          -4          75
 ips,ipe,jps,jpe           76         150           1          68
 *************************************
DYNAMICS OPTION: Eulerian Mass Coordinate
   alloc_space_field: domain            1 ,              657244104  bytes allocated
d01 2023-02-23_00:00:00  Yes, this special data is acceptable to use: OUTPUT FROM METGRID V4.3.1
d01 2023-02-23_00:00:00  Input data is acceptable to use:
 metgrid input_wrf.F first_date_input = 2023-02-23_00_00_00
 metgrid input_wrf.F first_date_nml = 2023-02-23_00:00:00
----------------- ERROR -------------------
namelist    : NUM_LAND_CAT =         21
input files : NUM_LAND_CAT =         24 (from geogrid selections).
d01 2023-02-23_00:00:00 ---- ERROR: Mismatch between namelist and wrf input files for dimension NUM_LAND_CAT
NOTE:       1 namelist vs input data inconsistencies found.
-------------- FATAL CALLED ---------------

Checking the met_em.d01.2023-02-23_00_00_00.nc_0000 and met_em.d01.2023-02-23_00_00_00.nc_0001 shows that both "fit" together and all tiles look good after manual inspection. However, checking the global attributes of individual met_em netCDF-files shows that tile _0000 for all time steps has
Code:
// global attributes:
...
        :MAP_PROJ = 1 ;
        :MMINLU = "USGS" ;
        :NUM_LAND_CAT = 28 ;
        :ISWATER = 16 ;
        :ISLAKE = 28 ;
        :ISICE = 24 ;
        :ISURBAN = 1 ;
        :ISOILWATER = 14 ;
...
while all other other tiles met_em*_0001 to met_em*_0255 have
Code:
// global attributes:
...
        :MAP_PROJ = 1 ;
        :MMINLU = "USGS" ;
        :NUM_LAND_CAT = 0 ;
        :ISWATER = 16 ;
        :ISLAKE = 28 ;
        :ISICE = 24 ;
        :ISURBAN = 1 ;
        :ISOILWATER = 14 ;
...

I used
Code:
#!/bin/bash
for fname in met_em.d0*
do
        echo "Working on $fname"
        ncatted -O -a NUM_LAND_CAT,global,o,i,28 $fname
done
as a workaround to re-set the num_land_cat in all met_em-tiles an then real.exe finishes successfully with io_form_auxinput1 = 102.
I couldn't find in the WPS/metgrid source code where num_land_cat is written to the individual tiles, so no proper fix available yet.
 
Last edited:
another limitation for running metgrid and real with io_form = 102 is that metgrid.ese can only be run up to 10000 MPI procs because the filename-suffix is limited to four digits (met_em*_XXXX). I could find that
In metgrid/src/output_module.F I already updated the filename to use 8 digits but that doesn't do the trick here.
Code:
      if (nprocs > 1 .and. do_tiled_output) then
         write(output_fname(len_trim(output_fname)+1:len_trim(output_fname)+5), '(a1,i8.8)') &
              '_', my_proc_id
      end if
any hints amybe?

Moreover I found that real.exe is having issues when it is executed on a large number of MPI procs. I am investigating this futher.
 
Top