Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

(SOLVED) Segmentation fault when running tools/registry

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

milancurcic

New member
Hi WRF forum,

I run into a segmentation fault when building WRF-4.2.1 using xlf on Power9 (configure option 38).

Specifically:

Code:
/bin/sh: line 1: 143159 Segmentation fault      tools/registry -DEM_CORE=1 -DNMM_CORE=0 -DNMM_MAX_DIM=2600 -DDA_CORE=0 -DWRFPLUS=0 -DIWORDSIZE=4 -DDWORDSIZE=8 -DRWORDSIZE=4 -DLWORDSIZE=4 -DNONSTANDARD_SYSTEM_SUBR -DWRF_USE_CLM -DRPC_TYPES=1 -DDM_PARALLEL -DNETCDF -DHDF5 -DLANDREAD_STUB=1 -DMOVE_NESTS -DVORTEX_CENTER -DUSE_ALLOCATABLES -Dwrfmodel -DGRIB1 -DINTIO -DKEEP_INT_AROUND -DLIMIT_ARGS -DBUILD_RRTMG_FAST=0 -DBUILD_RRTMK=0 -DBUILD_SBM_FAST=1 -DSHOW_ALL_VARS_USED=0 -DCONFIG_BUF_LEN=65536 -DMAX_DOMAINS_F=21 -DMAX_HISTORY=25 -DNMM_NEST=0 -DNEW_BDYS Registry/Registry
make[2]: [module_state_description.F] Error 139 (ignored)

This is the last line of the registry output before it segfaults:

Code:
package   tconly         use_wps_input==2            -             state:u_gc,v_gc,t_gc,rh_gc,ght_gc,p_gc,xlat_gc,xlong_gc,ht_gc,tsk_gc,tavgsfc,tmn_gc,pslv_gc,sct_dom_gc,scb_dom_gc,greenfrac,albedo12m,pd_gc,psfc_gc,intq_gc,pdhs,sh_gc,qv_gc,qr_gc,qc_gc,qs_gc,qi_gc,qg_gc,qh_gc,qni_gc,qnc_gc,qnr_gc,qns_gc,qng_gc,qnh_gc,icefrac_gc

I already have ulimit -s unlimited, so it's not stack overflow. Something else.

Also, I was able to successfully build WRF on this system using their gcc fork (IBM Advance Toolchain). So this is something xlf-specific.

Has anybody run into this issue before? I haven't.

Thank you!
Milan
 
Brief update: If I configure WRF without moving nests (nest option 1), then tools/registry doesn't segfault and completes successfully.
 
Hi,
I assume that you DO want to compile a moving nest case, though? If so, let me know and we will keep looking at this. Otherwise, let me know you are all set and able to move forward with your work. Thanks!
 
Hi, yes, I want to compile it with moving nests. I am now looking into this problem again. Please let me know if you find anything and I'll do the same.
 
To reproduce my problem since it's been a few weeks, I compile WRF with the moving nests option again. I can reproduce the segfault when running registry.

Interestingly, if I do:

Code:
cd tools
make clean
make
cd -
tools/registry -DEM_CORE=1 -DNMM_CORE=0 -DNMM_MAX_DIM=2600 -DDA_CORE=0 -DWRFPLUS=0 -DIWORDSIZE=4 -DDWORDSIZE=8 -DRWORDSIZE=4 -DLWORDSIZE=4 -DNONSTANDARD_SYSTEM_SUBR -DWRF_USE_CLM -DUSE_NETCDF4_FEATURES -DWRFIO_NCD_LARGE_FILE_SUPPORT -DRPC_TYPES=1 -DDM_PARALLEL -DNETCDF -DHDF5 -DLANDREAD_STUB=1 -DMOVE_NESTS -DVORTEX_CENTER -DUSE_ALLOCATABLES -Dwrfmodel -DGRIB1 -DINTIO -DKEEP_INT_AROUND -DLIMIT_ARGS -DBUILD_RRTMG_FAST=0 -DBUILD_RRTMK=0 -DBUILD_SBM_FAST=1 -DSHOW_ALL_VARS_USED=0 -DCONFIG_BUF_LEN=65536 -DMAX_DOMAINS_F=21 -DMAX_HISTORY=25 -DNMM_NEST=0 -DNEW_BDYS Registry/Registry

then the registry program does not segfault!

Issuing "compile em_real" now proceeds the build, but eventually it fails with the following:

Code:
 mpif90 -o module_check_a_mundo.o -c -q64 -O3 -qstrict  -w -qspill=81920 -qmaxmem=-1 -qfree=f90 -qufmt=be      -I../dyn_em -I../dyn_nmm  -I/home/mcurcic/github/WRF/external/esmf_time_f90  -I/home/mcurcic/github/WRF/main -I/home/mcurcic/github/WRF/external/io_netcdf -I/home/mcurcic/github/WRF/external/io_int -I/home/mcurcic/github/WRF/frame -I/home/mcurcic/github/WRF/share -I/home/mcurcic/github/WRF/phys -I/home/mcurcic/github/WRF/wrftladj -I/home/mcurcic/github/WRF/chem -I/home/mcurcic/github/WRF/inc -I/home/mcurcic/opt/netcdf-4.7.4_xl-16.1.1/include  -qrealsize=4 -qintsize=4 -qsuffix=f=f90 module_check_a_mundo.f90
** module_get_file_names   === End of Compilation 1 ===
"module_check_a_mundo.f90", line 1556.43: 1514-088 (S) Invalid component name.
"module_check_a_mundo.f90", line 1557.43: 1514-088 (S) Invalid component name.
"module_check_a_mundo.f90", line 1558.43: 1514-088 (S) Invalid component name.
"module_check_a_mundo.f90", line 1559.43: 1514-088 (S) Invalid component name.
"module_check_a_mundo.f90", line 1560.43: 1514-088 (S) Invalid component name.
"module_check_a_mundo.f90", line 1561.38: 1514-088 (S) Invalid component name.
"module_check_a_mundo.f90", line 1573.67: 1514-088 (S) Invalid component name.
"module_check_a_mundo.f90", line 1574.67: 1514-088 (S) Invalid component name.
"module_check_a_mundo.f90", line 1575.67: 1514-088 (S) Invalid component name.
"module_check_a_mundo.f90", line 1576.67: 1514-088 (S) Invalid component name.
"module_check_a_mundo.f90", line 1577.67: 1514-088 (S) Invalid component name.
"module_check_a_mundo.f90", line 1589.43: 1514-088 (S) Invalid component name.
"module_check_a_mundo.f90", line 1590.43: 1514-088 (S) Invalid component name.
"module_check_a_mundo.f90", line 1591.43: 1514-088 (S) Invalid component name.
"module_check_a_mundo.f90", line 1592.43: 1514-088 (S) Invalid component name.
"module_check_a_mundo.f90", line 1593.43: 1514-088 (S) Invalid component name.
"module_check_a_mundo.f90", line 1594.38: 1514-088 (S) Invalid component name.
"module_check_a_mundo.f90", line 1606.67: 1514-088 (S) Invalid component name.
"module_check_a_mundo.f90", line 1607.67: 1514-088 (S) Invalid component name.
"module_check_a_mundo.f90", line 1608.67: 1514-088 (S) Invalid component name.
"module_check_a_mundo.f90", line 1609.67: 1514-088 (S) Invalid component name.
"module_check_a_mundo.f90", line 1610.67: 1514-088 (S) Invalid component name.
** module_check_a_mundo   === End of Compilation 1 ===
1501-511  Compilation failed for file module_check_a_mundo.f90.

and also:

Code:
/home/mcurcic/github/WRF/tools/standard.exe output_wrf.F > output_wrf.b
cpp -P -nostdinc -Uvector -I/home/mcurcic/github/WRF/inc -DEM_CORE=1 -DNMM_CORE=0 -DNMM_MAX_DIM=2600 -DDA_CORE=0 -DWRFPLUS=0 -DIWORDSIZE=4 -DDWORDSIZE=8 -DRWORDSIZE=4 -DLWORDSIZE=4 -DNONSTANDARD_SYSTEM_SUBR -DWRF_USE_CLM -DUSE_NETCDF4_FEATURES -DWRFIO_NCD_LARGE_FILE_SUPPORT -DRPC_TYPES=1  -DDM_PARALLEL -DNETCDF -DHDF5 -DLANDREAD_STUB=1 -DMOVE_NESTS -DVORTEX_CENTER -DUSE_ALLOCATABLES -Dwrfmodel -DGRIB1 -DINTIO -DKEEP_INT_AROUND -DLIMIT_ARGS -DBUILD_RRTMG_FAST=0 -DBUILD_RRTMK=0 -DBUILD_SBM_FAST=1 -DSHOW_ALL_VARS_USED=0 -DCONFIG_BUF_LEN=65536 -DMAX_DOMAINS_F=21 -DMAX_HISTORY=25 -DNMM_NEST=0  -I. -DUSE_NETCDF4_FEATURES -DWRFIO_NCD_LARGE_FILE_SUPPORT -traditional-cpp   output_wrf.b  > output_wrf.f90
rm -f output_wrf.b
if fgrep -iq '!$OMP' output_wrf.f90 ; then \
          if [ -n "" ] ; then echo COMPILING output_wrf.F WITH OMP ; fi ; \
  mpif90 -c -qrealsize=4 -qintsize=4 -q64 -qnoopt -qstrict  -w -qspill=81920 -qmaxmem=-1 -qfree=f90 -qufmt=be     -I../dyn_em -I../dyn_nmm  -I/home/mcurcic/github/WRF/external/esmf_time_f90  -I/home/mcurcic/github/WRF/main -I/home/mcurcic/github/WRF/external/io_netcdf -I/home/mcurcic/github/WRF/external/io_int -I/home/mcurcic/github/WRF/frame -I/home/mcurcic/github/WRF/share -I/home/mcurcic/github/WRF/phys -I/home/mcurcic/github/WRF/wrftladj -I/home/mcurcic/github/WRF/chem -I/home/mcurcic/github/WRF/inc -I/home/mcurcic/opt/netcdf-4.7.4_xl-16.1.1/include  -qsuffix=f=f90  output_wrf.f90 ; \
        else \
          if [ -n "" ] ; then echo COMPILING output_wrf.F WITHOUT OMP ; fi ; \
  mpif90 -c -qrealsize=4 -qintsize=4 -q64 -qnoopt -qstrict  -w -qspill=81920 -qmaxmem=-1 -qfree=f90 -qufmt=be     -I../dyn_em -I../dyn_nmm  -I/home/mcurcic/github/WRF/external/esmf_time_f90  -I/home/mcurcic/github/WRF/main -I/home/mcurcic/github/WRF/external/io_netcdf -I/home/mcurcic/github/WRF/external/io_int -I/home/mcurcic/github/WRF/frame -I/home/mcurcic/github/WRF/share -I/home/mcurcic/github/WRF/phys -I/home/mcurcic/github/WRF/wrftladj -I/home/mcurcic/github/WRF/chem -I/home/mcurcic/github/WRF/inc -I/home/mcurcic/opt/netcdf-4.7.4_xl-16.1.1/include  -qsuffix=f=f90 output_wrf.f90 ; \
        fi
"output_wrf.f90", line 641.24: 1516-036 (S) Entity auxhist23_only has undefined type.

Indeed, auxhist23_only is referenced in share/output_wrf.F, but is not declared anywhere. This makes me think that auxhist23_only then must come from one of the auto-generated include files.

The only reference to auxhist23 I can find is in Registry/registry.diags:

Code:
#  Derived, this is interval in seconds that is from auxhist23 interval, computed in check_a_mundo

rconfig   real        p_lev_interval      derived          max_domains  0   -    "interval to compute/output p level diags"                   "s"

Do you have a hint for me on where to look for the issue?

Thank you!
 
Okay, I needed to pass proper CPP macros to make when building registry:

Code:
make -i -r CC_TOOLS_CFLAGS="-DNMM_CORE=0" CC_TOOLS="cc -DIWORDSIZE=4 -DMAX_HISTORY=25"

So I'm doing the same as what compile em_real was doing (based on the log), except I'm doing it by hand. registry still doesn't segfault with this, and now all executables compile.

I will make some test runs and see if all looks okay.
 
I'm able to compile successfully by building the registry program manually. I don't know why it makes a difference.

The model runs with nests, however it segfaults on first nest move. I will open a separate topic on this problem and ask for advice on how to further diagnose it.
 
That is very odd that you had to build it that way, but I'm glad you were able to get it working. Thank you for updating the post!
 
Top