Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

NETCDFPARPATH

gthompsnGE

New member
I am having a lot of confusion with terminology related to parallel-netcdf. I was using v4.0.3 for years and decided to bump up to v4.6.1 and it seems a new environment with NETCDFPAR has been added together with PNETCDF and the distinction is very unclear (to me). Even the doc/README.netcdf4par file is unclear.

I tried to eliminate externals/io_netcdfpar from compiling by directly removing the -netcdfpar=$NETCDFPAR chunk from the arch/Config.pl line but it is still attempting to include io_netcdfpar rather than skip it.

I was under the impression that the environment variable NETCDFPAR must be set in order for that subdir to be compiled, but it is not set in my environment nor in my final configure.wrf file. So can anyone suggest how to neglect this subdir from compilation?

FYI. I do have PNETCDF env var set because I am attempting to get PNETCDF but not NETCDFPAR.

Thanks for any ideas I can try.
 
From the doc/README.netcdf4par :
Using parallel IO through the netCDF-4 interface (io_form = 13)

If you don't need variable-level compression, stop and go use pnetcdf
(parallel-netcdf-1.9.0), which will have better IO performance. (Also
should be using a parallel file system to gain benefits.)
netcdf4par is compression-enabled (via HDF5) parallel IO of netCDF4. A more viable alternative to this if no compression is needed is to use parallel netCDF (PnetCDF) :
The naming conventions are certainly confusing..

You do not need to manually remove NETCDFPAR logic from any of the build system; if not enabled the folder compilation is skipped entirely. You should see a "SKIPPING: <netcdfpar directory and make command>" output from the build log. For example, mine outputs :
Code:
cd ../io_netcdfpar ; \
          echo SKIPPING make -i -r NETCDFPARPATH="/home/aislas/wrf-model/forum_help/tmp/wrf_dependencies/netcdf" NETCDF4_DEP_LIB="-L/home/aislas/wrf-model/forum_help/tmp/wrf_dependencies/netcdf/lib -lnetcdf -L/home/aislas/wrf-model/forum_help/tmp/wrf_dependencies/netcdf/lib -lnetcdff -L/home/aislas/wrf-model/forum_help/tmp/wrf_dependencies/netcdf/lib -L/home/aislas/wrf-model/forum_help/tmp/wrf_dependencies/grib2/lib -lnetcdf -lm -lnetcdf -lz" \
               FC="gfortran -w -ffree-form -ffree-line-length-none -fconvert=big-endian -frecord-marker=4 -fallow-argument-mismatch -fallow-invalid-boz    " RANLIB="ranlib" \
               CPP="/lib/cpp -P -nostdinc" LDFLAGS=" -O2 -ftree-vectorize -funroll-loops -w -ffree-form -ffree-line-length-none -fconvert=big-endian -frecord-marker=4 -fallow-argument-mismatch -fallow-invalid-boz   " TRADFLAG="-traditional-cpp -DUSE_NETCDF4_FEATURES -DWRFIO_NCD_LARGE_FILE_SUPPORT" ESMF_IO_LIB_EXT="-L/home/aislas/wrf-model/wrf/external/esmf_time_f90 -lesmf_time" \
       LIB_LOCAL="" \
               ESMF_MOD_DEPENDENCE="/home/aislas/wrf-model/wrf/external/esmf_time_f90/module_utility.o" AR="INTERNAL_BUILD_ERROR_SHOULD_NOT_NEED_AR";

which then leads to :
Code:
SKIPPING make -i -r NETCDFPARPATH=/home/aislas/wrf-model/forum_help/tmp/wrf_dependencies/netcdf NETCDF4_DEP_LIB=-L/home/aislas/wrf-model/forum_help/tmp/wrf_dependencies/netcdf/lib -lnetcdf -L/home/aislas/wrf-model/forum_help/tmp/wrf_dependencies/netcdf/lib -lnetcdff -L/home/aislas/wrf-model/forum_help/tmp/wrf_dependencies/netcdf/lib -L/home/aislas/wrf-model/forum_help/tmp/wrf_dependencies/grib2/lib -lnetcdf -lm -lnetcdf -lz FC=gfortran -w -ffree-form -ffree-line-length-none -fconvert=big-endian -frecord-marker=4 -fallow-argument-mismatch -fallow-invalid-boz     RANLIB=ranlib CPP=/lib/cpp -P -nostdinc LDFLAGS= -O2 -ftree-vectorize -funroll-loops -w -ffree-form -ffree-line-length-none -fconvert=big-endian -frecord-marker=4 -fallow-argument-mismatch -fallow-invalid-boz    TRADFLAG=-traditional-cpp -DUSE_NETCDF4_FEATURES -DWRFIO_NCD_LARGE_FILE_SUPPORT ESMF_IO_LIB_EXT=-L/home/aislas/wrf-model/wrf/external/esmf_time_f90 -lesmf_time LIB_LOCAL= ESMF_MOD_DEPENDENCE=/home/aislas/wrf-model/wrf/external/esmf_time_f90/module_utility.o AR=INTERNAL_BUILD_ERROR_SHOULD_NOT_NEED_AR
 
But that is exactly the problem. It IS NOT skipping like I think it should and I cannot determine why it is not skipping. All of my efforts to bypass NETCDFPAR have failed. I do see: SKIPPING PIO_BUILD, but for reasons I'm unable to understand, it is not doing it with NETCDFPAR.

All appears fine when I build HDF5, parallel-netcdf, netcdf-c, and netcdf-fortran - all of these are building nicely via Spack. When running configure, I keep getting the section

wrfio_nfpar : (cd . . .

in the resulting configure.wrf file. So I attempted to take it out of there as well as removing it from arch/postamble, yet at compile time, I keep seeing NETCDFPAR attempting to build instead of skipping. I also removed from the configure script the chunk sent to Config.Pl that contains netcdfpar=$NETCDFPAR entirely. It's super perplexing (to me).

Any thoughts for how that would be the case?
 
Could you upload your configure.wrf and build log in a clean unmodified WRF setup?
Sure, I am starting over yet again from "zero-state" because I did notice I'm building on AWS-Graviton, but my last attempt was MPICH and from reviewing arch/configure.defaults, it seems I should swap MPICH with OPENMPI. Once my Spack dependencies rebuild, I'll use configure from a totally fresh git clone of WRF and capture what you asked. Thanks.
 
I am attaching 3 GZIP files.

One file is a capture of stdout when I ran the configure script, because I'm definitively showing that USENETCDFPAR=0 and the output of "env | grep PATH" that I placed into the configure script. This clearly shows that NETCDFPARPATH is not set in my environment.

Second file is the resulting configure.wrf

Third file is a capture of a snippet of the compilation log (stdout/err) showing the switch first to external/io_netcdf followed by a switch to external/io_netcdfpar where compilation therein fails with

Fatal Error: Cannot open module file ‘wrf_data_ncpar.mod’ for reading at (1): No such file or directory

But I just cannot determine why it is attempting to compile in this directory at all.
 

Attachments

  • compile.log.snippet.txt.gz
    1.8 KB · Views: 1
  • configure.stdout.txt.gz
    3.7 KB · Views: 1
  • configure.wrf.gthompsn.gz
    5.2 KB · Views: 1
It looks as if in arch/Config.pl the configuration is skipping these lines (line numbers based on v4.6.1):
Code:
 683      else
 684        { $_ =~ s/CONFIGURE_WRFIO_NFPAR//g ;
 685          $_ =~ s:CONFIGURE_NETCDFPAR_FLAG::g ;
 686          $_ =~ s:CONFIGURE_NETCDFPAR_BUILD:echo SKIPPING: ;
 687          $_ =~ s:CONFIGURE_NETCDFPAR_LIB_PATH::g ;
 688           }

You should be able to add something like
Code:
printf "Disabling NETCDFPAR\n" ;
somewhere in that else statement, and reconfigure seeing if it is printed out in the stdout of the configuration process. It looks like the logic for if $sw_netcdfpar_path is being evaluated correctly as further into the file there is a removal of CONFIGUR_NETCDFPAR_FLAG from the ARCHFLAGS, as expected for your configuration settings.
 
I put a print statement in both code blocks in `arch/Config.pl` to confirm that internally the test of $sw_netcdfpar_path *is not set* and just as before, the build (compile) still switches into `io_netcdfpar` and fails in there. It just isn't getting skipped no matter what I've tried.
 
Please add the following to arch/configure.defaults at the bottom of the stanza you are selecting (ctrl+F "Linux aarch64, GCC compiler OpenMPI):
Code:
NETCDFPAR_BUILD    =      CONFIGURE_NETCDFPAR_BUILD
 
Thank you @islas That did work to get beyond the impasse for failing to compile netcdfpar subdirectory.
However, when attempting to build `wrf.exe` there was a failure at loading time shown by this snippet (although no error showed in the log file when compiling mediation_interp_domain.f90


time mpif90 -o wrf.exe -fopenmp -Ofast -march=armv8.2-a+fp16+rcpc+dotprod -funroll-loops -fno-expensive-optimizations -fno-reciprocal-math -fsigned-zeros -fno-unsafe-math-optimizations -w -ffree-form -ffree-line-length-0 -fallow-argument-mismatch -fallow-invalid-boz -fconvert=big-endian -frecord-marker=4 -fopenmp wrf.o ../main/module_wrf_top.o libwrflib.a /home/ec2-user/src/WRF_V4.6.1/external/fftpack/fftpack5/libfftpack.a /home/ec2-user/src/WRF_V4.6.1/external/io_grib1/libio_grib1.a /home/ec2-user/src/WRF_V4.6.1/external/io_grib_share/libio_grib_share.a /home/ec2-user/src/WRF_V4.6.1/external/io_int/libwrfio_int.a -L/home/ec2-user/src/WRF_V4.6.1/external/esmf_time_f90 -lesmf_time /home/ec2-user/src/WRF_V4.6.1/external/RSL_LITE/librsl_lite.a /home/ec2-user/src/WRF_V4.6.1/frame/module_internal_header_util.o /home/ec2-user/src/WRF_V4.6.1/frame/pack_utils.o -L/home/ec2-user/src/WRF_V4.6.1/external/io_netcdf -lwrfio_nf -L/home/ec2-user/src/spack/opt/spack/netcdf/lib -lnetcdff -lnetcdf -L/home/ec2-user/src/WRF_V4.6.1/external/io_pnetcdf -lwrfio_pnf -L/home/ec2-user/src/spack/opt/spack/parallel-netcdf-1.14.0/lib -lpnetcdf -L/home/ec2-user/src/spack/opt/spack/hdf5-1.14.5/lib -lhdf5_hl_fortran -lhdf5_hl -lhdf5_fortran -lhdf5 -lm -lz -L/home/ec2-user/src/spack/opt/spack/netcdf-c-4.9.2/lib -lnetcdf -L/home/ec2-user/src/spack/opt/spack/netcdf-fortran-4.6.1/lib -lnetcdff -lnetcdf -lnetcdf -lm

/usr/bin/ld: libwrflib.a(mediation_interp_domain.o): in function `med_interp_domain_':
mediation_interp_domain.f90:(.text+0x6c4): undefined reference to `interp_domain_em_part1_'
/usr/bin/ld: mediation_interp_domain.f90:(.text+0x1868): undefined reference to `interp_domain_em_part2_'
/usr/bin/ld: mediation_interp_domain.f90:(.text+0x299c): undefined reference to `interp_domain_em_part3_'
 
I do not understand why my reply with the compile log attached was deleted, but I did as requested. I did a fresh clean -a, then compiled and gzipped and uploaded the log. That post has disappeared. So I did it exactly as requested again. As previously happens, I am seeing that RSL_LITE source codes are not being compiled, specifically

interp_domain_em_part[123].F
 

Attachments

  • compile.log.gz
    51.3 KB · Views: 1
But isn't that simply a comment line? The stanza in `arch/configure.defaults` has a DMPARALLEL = <blank> which I am guessing is identical to your suggestion. Is it not?
 
The comment gets transformed into :
Code:
DMPARALLEL      =       1
inside of arch/Config.pl if a dm configuration has been selected. When DMPARALLEL=1, inside of frame/Makefile the objects from RSL_LITE that are missing are added to the compilation files list.
 
OK. Great news. At last, this is working. I get the proper wrf.exe, real.exe, etc.

Thank you for the help. Will you be submitting a bug fix for this to the WRF team for this stanza section of `arch/configure.defaults`?
 
Top