Hi all,
I have a very large domain with 5000x5000 grid points and want to create met_em files using ECMWF IFS input data with a large number of vertical levels making things even more difficult...
Creating met_em files works but takes a lot of I/O time with io_form_metgrid = 2 writing out (compressed) netCDF4 files. I can speed up the output using io_form_metgrid = 102. However, executing (a modified version of) the netCDF joiner to merge the small netCDFs written by each MPI rank also takes ages (>30min per timestep) but eventually creates the correct met_em files.
I want to speed up things and include the pnetCDF capability in WPS/metgrid (io_form_metgrid = 11). I already modifed the pnetCDF implementation in the WRF installation to write CDF5 files to circumvent format restrictions of 64byte-offset-format written by pnetCDF for such large domains. I implemented also all the necessary parts in WPS to compile with PNETCDF linked.
I modified WRFmetgrid/src/output_module.F at several parts including this one here:
resulted in
For debugging I changed the communicators passed to ext_pnc_write_field to MPI_COMM_WORLD:
Running metgrid in serial mode (./metgrid.exe) and with mpirun -np 1 ./metgrid.exe now works and files are created via pnetCDF in CDF5 format. All works correctly.
However, when using mpirun with > 1 execution stalls at the log message "--- Initializing output module."
So for me it seems that using MPI_COMM_WORLD is not appropriate here. From the WRF source code I found that grid%communicator is passed to ext_pnc_open_for_write.
For me it looks like this is an issue with the MPI communicator passed to the subroutine.... Any advice what communicator to use in WPS in the call for ext_pnc_open_for_write (and similar functions)? Any help is very appreciated!
I am willing to commit pnetCDF capability for WPS once fully working with > 1 MPI ranks.
I have a very large domain with 5000x5000 grid points and want to create met_em files using ECMWF IFS input data with a large number of vertical levels making things even more difficult...
Creating met_em files works but takes a lot of I/O time with io_form_metgrid = 2 writing out (compressed) netCDF4 files. I can speed up the output using io_form_metgrid = 102. However, executing (a modified version of) the netCDF joiner to merge the small netCDFs written by each MPI rank also takes ages (>30min per timestep) but eventually creates the correct met_em files.
I want to speed up things and include the pnetCDF capability in WPS/metgrid (io_form_metgrid = 11). I already modifed the pnetCDF implementation in the WRF installation to write CDF5 files to circumvent format restrictions of 64byte-offset-format written by pnetCDF for such large domains. I implemented also all the necessary parts in WPS to compile with PNETCDF linked.
I modified WRFmetgrid/src/output_module.F at several parts including this one here:
Code:
#ifdef IO_BINARY
if (io_form_output == BINARY) then
call ext_int_write_field(handle, datestr, trim(fields(i)%fieldname), &
real_dom_array, WRF_REAL, comm_1, comm_2, domain_desc, trim(fields(i)%mem_order), &
trim(fields(i)%stagger), fields(i)%dimnames, sd, ed, sm, em, sp, ep, istatus)
end if
#endif
#ifdef IO_NETCDF
if (io_form_output == NETCDF) then
call ext_ncd_write_field(handle, datestr, trim(fields(i)%fieldname), &
real_dom_array, WRF_REAL, comm_1, comm_2, domain_desc, trim(fields(i)%mem_order), &
trim(fields(i)%stagger), fields(i)%dimnames, sd, ed, sm, em, sp, ep, istatus)
end if
#endif
#ifdef IO_PNETCDF
if (io_form_output == PNETCDF) then
call ext_pnc_write_field(handle, datestr, trim(fields(i)%fieldname), &
real_dom_array, WRF_REAL, comm_1, comm_2, domain_desc, trim(fields(i)%mem_order), &
trim(fields(i)%stagger), fields(i)%dimnames, sd, ed, sm, em, sp, ep, istatus)
end if
#endif
#ifdef IO_GRIB1
if (io_form_output == GRIB1) then
call ext_gr1_write_field(handle, datestr, trim(fields(i)%fieldname), &
real_dom_array, WRF_REAL, comm_1, comm_2, domain_desc, trim(fields(i)%mem_order), &
trim(fields(i)%stagger), fields(i)%dimnames, sd, ed, sm, em, sp, ep, istatus)
end if
#endif
call mprintf((istatus /= 0),ERROR,'Error in ext_pkg_write_field')
resulted in
Code:
Abort(805912325) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Comm_rank: Invalid communicator, error stack:
PMPI_Comm_rank(106): MPI_Comm_rank(comm=0x1, rank=0x7fffc89053cc) failed
PMPI_Comm_rank(63).: Invalid communicator
For debugging I changed the communicators passed to ext_pnc_write_field to MPI_COMM_WORLD:
Code:
#ifdef IO_PNETCDF
if (io_form_output == PNETCDF) then
call ext_pnc_open_for_write(trim(output_fname), MPI_COMM_WORLD, MPI_COMM_WORLD, 'sysdep info', handle, istatus)
! call ext_pnc_open_for_write(trim(output_fname), comm_1, comm_2, 'sysdep info', handle, istatus)
end if
#endif
Running metgrid in serial mode (./metgrid.exe) and with mpirun -np 1 ./metgrid.exe now works and files are created via pnetCDF in CDF5 format. All works correctly.
However, when using mpirun with > 1 execution stalls at the log message "--- Initializing output module."
So for me it seems that using MPI_COMM_WORLD is not appropriate here. From the WRF source code I found that grid%communicator is passed to ext_pnc_open_for_write.
For me it looks like this is an issue with the MPI communicator passed to the subroutine.... Any advice what communicator to use in WPS in the call for ext_pnc_open_for_write (and similar functions)? Any help is very appreciated!
I am willing to commit pnetCDF capability for WPS once fully working with > 1 MPI ranks.