I am trying to understand why I am gettting errors running a model with pnetcdf and quilting. I could run on 2 nodes using 1 IOGROUP and 6 IOTASKS, however I am not able to run with 2 IOGROUPS and 4 IOTASKS. The errors I get do not provide much information:
Does this suugests any runtime errors that I might be missing?
Code:
Quilting with 2 groups of 4 I/O tasks.
Namelist logging not found in namelist.input. Using registry defaults for variables in logging.
Adding field entry no. 1
Variable = XLAT
Domain start = 1 1 1
Domain end = 399 399 1
Variable XLAT, patch 1: (201:300,285:313, 1: 1)
Variable XLAT, patch 2: (301:399,285:313, 1: 1)
Variable XLAT, patch 3: ( 1:100,314:342, 1: 1)
Variable XLAT, patch 4: (101:200,314:342, 1: 1)
Variable XLAT, patch 5: (201:300,314:342, 1: 1)
Variable XLAT, patch 6: (301:399,314:342, 1: 1)
Variable XLAT, patch 7: ( 1:100,343:371, 1: 1)
Variable XLAT, patch 8: (101:200,343:371, 1: 1)
Variable XLAT, patch 9: (201:300,343:371, 1: 1)
Variable XLAT, patch 10: (301:399,343:371, 1: 1)
Variable XLAT, patch 11: ( 1:100,372:399, 1: 1)
Variable XLAT, patch 12: (101:200,372:399, 1: 1)
Variable XLAT, patch 13: (201:300,372:399, 1: 1)
Variable XLAT, patch 14: (301:399,372:399, 1: 1)
write_outbuf_pnc: table has 1 entries
write_outbuf_pnc: writing 2019-12-09_18:00:00 XLAT XY
--------------------------
Calling write for patch: 1 Start = 201 285 1
End = 399 313 1
Calling write for patch: 3 Start = 1 314 1
End = 399 399 1
[g06:2638883:0:2638883] Caught signal 11 (Segmentation fault: tkill(2) or tgkill(2) at address 0xe2ef00284423)
==== backtrace (tid:2638883) ====
0 0x0000000000055be9 ucs_debug_print_backtrace() ???:0
1 0x0000000000012b20 .annobin_sigaction.c() sigaction.c:0
2 0x00000000016ee4fa module_quilt_outbuf_ops_mp_init_outbuf_() /home/Applications/WRF/WRF-3.5.1_icc-2020.4.304_IMPI_real/WRFV3/frame/module_quilt_outbuf_ops.f90:115
3 0x00000000016e0932 module_quilt_outbuf_ops_mp_write_outbuf_pnc_() /home/Applications/WRF/WRF-3.5.1_icc-2020.4.304_IMPI_real/WRFV3//frame/module_quilt_outbuf_ops.f90:298
4 0x00000000014a6942 module_wrf_quilt_mp_quilt_pnc_() /home/Applications/WRF/WRF-3.5.1_icc-2020.4.304_IMPI_real/WRFV3//frame/module_io_quilt.f90:1353
5 0x00000000014a4ca3 module_wrf_quilt_mp_quilt_() //home/Applications/WRF/WRF-3.5.1_icc-2020.4.304_IMPI_real/WRFV3//frame/module_io_quilt.f90:403
6 0x00000000014a8fa8 module_wrf_quilt_mp_init_module_wrf_quilt_() /home/Applications/WRF/WRF-3.5.1_icc-2020.4.304_IMPI_real/WRFV3/frame/module_io_quilt.f90:1473
7 0x0000000000f3ef0e init_modules_() //home/Applications/WRF/WRF-3.5.1_icc-2020.4.304_IMPI_real/WRFV3/share/init_modules.f90:97
8 0x000000000041d665 module_wrf_top_mp_wrf_init_() /home/Applications/WRF/WRF-3.5.1_icc-2020.4.304_IMPI_real/WRFV3/main/../main/module_wrf_top.f90:138
9 0x000000000041cd24 MAIN__() /home/Applications/WRF/WRF-3.5.1_icc-2020.4.304_IMPI_real/WRFV3/main/wrf.f90:75
10 0x000000000041cca2 main() ???:0
11 0x0000000000023493 __libc_start_main() ???:0
12 0x000000000041cbae _start() ???:0
Does this suugests any runtime errors that I might be missing?