Hi,
I am running several 6 months (April - September) simulation with irrigation and no-irrigation setups for various years on Cheyenne. Several years have been completed with both setups, but unfortunately, year 2012 and 2011 stopped after completing about 4 - 4.5 month runs with irrigation, while about three other years completed both setups successfully with no error.
The first restart file is wrfrst_d01_2012-04-11_00:00:00 and the model stops after restarting from wrfrst_d02_2012-08-09_00:00:00.
Year 2011 had the same error after restarting from wrfrst_d02_2011-08-19_00:00:00.
However, three other years with the same setup were completed with no error and I could not figure out what is causing the error.
This is the path to the problematic run on Cheyenne.
/glade/scratch/achugbu/WRF_IRRI_2012/run
MPT: 0x00002b01b81897da in waitpid ()
MPT: from /glade/u/apps/ch/os/lib64/libpthread.so.0
MPT: Missing separate debuginfos, use: zypper install glibc-debuginfo-2.22-100.27.3.x86_64
MPT: (gdb) #0 0x00002b01b81897da in waitpid ()
MPT: from /glade/u/apps/ch/os/lib64/libpthread.so.0
MPT: #1 0x00002b01b84cec66 in mpi_sgi_system (
MPT: #2 MPI_SGI_stacktraceback (
MPT: header=header@entry=0x7ffe7b9cd390 "MPT ERROR: Rank 0(g:0) received signal SIGSEGV(11).\n\tProcess ID: 4110, Host: r13i4n11, Program: /glade/scratch/achugbu/WRF_IRRI_2012/main/wrf.exe\n\tMPT Version: HPE MPT 2.25 08/14/21 03:05:20\n") at sig.c:340
MPT: #3 0x00002b01b84cee66 in first_arriver_handler (signo=signo@entry=11,
MPT: stack_trace_sem=stack_trace_sem@entry=0x2b01c26e0080) at sig.c:489
MPT: #4 0x00002b01b84cf0f3 in slave_sig_handler (signo=11,
MPT: siginfo=<optimized out>, extra=<optimized out>) at sig.c:565
MPT: #5 <signal handler called>
MPT: #6 0x0000000002ba31c1 in module_sf_sfclayrev_mp_psim_stable_ ()
MPT: #7 0x0000000002b9e734 in module_sf_sfclayrev_mp_sfclayrev1d_ ()
MPT: #8 0x0000000002b9c333 in module_sf_sfclayrev_mp_sfclayrev_ ()
MPT: #9 0x0000000002469603 in module_surface_driver_mp_surface_driver_ ()
MPT: #10 0x0000000001d58015 in module_first_rk_step_part1_mp_first_rk_step_part1_
MPT: ()
MPT: #11 0x0000000001500067 in solve_em_ ()
MPT: #12 0x00000000013153fc in solve_interface_ ()
MPT: #13 0x000000000056431b in module_integrate_mp_integrate_ ()
MPT: #14 0x0000000000564932 in module_integrate_mp_integrate_ ()
MPT: #15 0x0000000000406291 in module_wrf_top_mp_wrf_run_ ()
MPT: #16 0x000000000040624f in MAIN__ ()
MPT: #17 0x00000000004061e2 in main ()
MPT: (gdb) A debugging session is active.
MPT:
MPT: Inferior 1 [process 4110] will be detached.
MPT:
MPT: Quit anyway? (y or n) [answered Y; input not from terminal]
MPT: Detaching from program: /proc/4110/exe, process 4110
MPT: [Inferior 1 (process 4110) detached]
MPT: -----stack traceback ends-----
I am running several 6 months (April - September) simulation with irrigation and no-irrigation setups for various years on Cheyenne. Several years have been completed with both setups, but unfortunately, year 2012 and 2011 stopped after completing about 4 - 4.5 month runs with irrigation, while about three other years completed both setups successfully with no error.
The first restart file is wrfrst_d01_2012-04-11_00:00:00 and the model stops after restarting from wrfrst_d02_2012-08-09_00:00:00.
Year 2011 had the same error after restarting from wrfrst_d02_2011-08-19_00:00:00.
However, three other years with the same setup were completed with no error and I could not figure out what is causing the error.
This is the path to the problematic run on Cheyenne.
/glade/scratch/achugbu/WRF_IRRI_2012/run
MPT: 0x00002b01b81897da in waitpid ()
MPT: from /glade/u/apps/ch/os/lib64/libpthread.so.0
MPT: Missing separate debuginfos, use: zypper install glibc-debuginfo-2.22-100.27.3.x86_64
MPT: (gdb) #0 0x00002b01b81897da in waitpid ()
MPT: from /glade/u/apps/ch/os/lib64/libpthread.so.0
MPT: #1 0x00002b01b84cec66 in mpi_sgi_system (
MPT: #2 MPI_SGI_stacktraceback (
MPT: header=header@entry=0x7ffe7b9cd390 "MPT ERROR: Rank 0(g:0) received signal SIGSEGV(11).\n\tProcess ID: 4110, Host: r13i4n11, Program: /glade/scratch/achugbu/WRF_IRRI_2012/main/wrf.exe\n\tMPT Version: HPE MPT 2.25 08/14/21 03:05:20\n") at sig.c:340
MPT: #3 0x00002b01b84cee66 in first_arriver_handler (signo=signo@entry=11,
MPT: stack_trace_sem=stack_trace_sem@entry=0x2b01c26e0080) at sig.c:489
MPT: #4 0x00002b01b84cf0f3 in slave_sig_handler (signo=11,
MPT: siginfo=<optimized out>, extra=<optimized out>) at sig.c:565
MPT: #5 <signal handler called>
MPT: #6 0x0000000002ba31c1 in module_sf_sfclayrev_mp_psim_stable_ ()
MPT: #7 0x0000000002b9e734 in module_sf_sfclayrev_mp_sfclayrev1d_ ()
MPT: #8 0x0000000002b9c333 in module_sf_sfclayrev_mp_sfclayrev_ ()
MPT: #9 0x0000000002469603 in module_surface_driver_mp_surface_driver_ ()
MPT: #10 0x0000000001d58015 in module_first_rk_step_part1_mp_first_rk_step_part1_
MPT: ()
MPT: #11 0x0000000001500067 in solve_em_ ()
MPT: #12 0x00000000013153fc in solve_interface_ ()
MPT: #13 0x000000000056431b in module_integrate_mp_integrate_ ()
MPT: #14 0x0000000000564932 in module_integrate_mp_integrate_ ()
MPT: #15 0x0000000000406291 in module_wrf_top_mp_wrf_run_ ()
MPT: #16 0x000000000040624f in MAIN__ ()
MPT: #17 0x00000000004061e2 in main ()
MPT: (gdb) A debugging session is active.
MPT:
MPT: Inferior 1 [process 4110] will be detached.
MPT:
MPT: Quit anyway? (y or n) [answered Y; input not from terminal]
MPT: Detaching from program: /proc/4110/exe, process 4110
MPT: [Inferior 1 (process 4110) detached]
MPT: -----stack traceback ends-----