Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Something wrong when i was running wrf.exe: Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

ChristianW

New member
Hi, everyone, when i was running WRF.exe, the process in my terminal was terminated with following information:
...
Timing for main: time 2024-03-28_00:02:56 on domain 3: 1.21472 elapsed seconds
Timing for main: time 2024-03-28_00:03:00 on domain 3: 1.21173 elapsed seconds
Timing for main: time 2024-03-28_00:02:52 on domain 4: 1.20646 elapsed seconds
Timing for main: time 2024-03-28_00:02:56 on domain 4: 1.20870 elapsed seconds
Timing for main: time 2024-03-28_00:03:00 on domain 4: 1.21031 elapsed seconds
Timing for main: time 2024-03-28_00:03:00 on domain 2: 9.71869 elapsed seconds
Timing for main: time 2024-03-28_00:03:00 on domain 1: 29.90280 elapsed seconds

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0 0x7f4ca3259d11 in ???
#1 0x7f4ca3258ee5 in ???
#2 0x7f4ca2ef208f in ???
at /build/glibc-LcI20x/glibc-2.31/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0
#3 0x5628d1c11ed8 in ???
#4 0x5628d1c13b9c in ???
#5 0x5628d1c25d80 in ???
#6 0x5628d1c282bc in ???
#7 0x5628d17eb0c0 in ???
#8 0x5628d18b85b2 in ???
#9 0x5628d13fafb3 in ???
#10 0x5628d12edf2f in ???
#11 0x5628d05fa56a in ???
#12 0x5628d0592207 in ???
#13 0x5628d0591c3e in ???
#14 0x7f4ca2ed3082 in __libc_start_main
at ../csu/libc-start.c:308
#15 0x5628d0591c7d in ???
#16 0xffffffffffffffff in ???

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 33724 RUNNING AT christian-virtual-machine
= KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 1 PID 33725 RUNNING AT christian-virtual-machine
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 2 PID 33726 RUNNING AT christian-virtual-machine
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
...
I have tried different number of cores when i use 'mpirun', such as '-np 4', '-np 8', but it didn't works, the wrong above appeared, too. What should i do? I guess this is the problem about my parameters of namelist are not fitting totally, please give me some advise. BTW, I can not find the running error log files in where i running WRF.exe(.../WRF/run).
With my appreciation,
Christian Wang
 

Attachments

  • namelist.wps
    914 bytes · Views: 3
  • namelist.input
    5 KB · Views: 5
Hi, everyone, when i was running WRF.exe, the process in my terminal was terminated with following information:
...
Timing for main: time 2024-03-28_00:02:56 on domain 3: 1.21472 elapsed seconds
Timing for main: time 2024-03-28_00:03:00 on domain 3: 1.21173 elapsed seconds
Timing for main: time 2024-03-28_00:02:52 on domain 4: 1.20646 elapsed seconds
Timing for main: time 2024-03-28_00:02:56 on domain 4: 1.20870 elapsed seconds
Timing for main: time 2024-03-28_00:03:00 on domain 4: 1.21031 elapsed seconds
Timing for main: time 2024-03-28_00:03:00 on domain 2: 9.71869 elapsed seconds
Timing for main: time 2024-03-28_00:03:00 on domain 1: 29.90280 elapsed seconds

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0 0x7f4ca3259d11 in ???
#1 0x7f4ca3258ee5 in ???
#2 0x7f4ca2ef208f in ???
at /build/glibc-LcI20x/glibc-2.31/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0
#3 0x5628d1c11ed8 in ???
#4 0x5628d1c13b9c in ???
#5 0x5628d1c25d80 in ???
#6 0x5628d1c282bc in ???
#7 0x5628d17eb0c0 in ???
#8 0x5628d18b85b2 in ???
#9 0x5628d13fafb3 in ???
#10 0x5628d12edf2f in ???
#11 0x5628d05fa56a in ???
#12 0x5628d0592207 in ???
#13 0x5628d0591c3e in ???
#14 0x7f4ca2ed3082 in __libc_start_main
at ../csu/libc-start.c:308
#15 0x5628d0591c7d in ???
#16 0xffffffffffffffff in ???

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 33724 RUNNING AT christian-virtual-machine
= KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 1 PID 33725 RUNNING AT christian-virtual-machine
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 2 PID 33726 RUNNING AT christian-virtual-machine
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
...
I have tried different number of cores when i use 'mpirun', such as '-np 4', '-np 8', but it didn't works, the wrong above appeared, too. What should i do? I guess this is the problem about my parameters of namelist are not fitting totally, please give me some advise. BTW, I can not find the running error log files in where i running WRF.exe(.../WRF/run).
With my appreciation,
Christian Wang
try increasing domain 1 to greater 100x100 or more grid spacing, the minimum it works with is 100x100
These links might help too.

 
try increasing domain 1 to greater 100x100 or more grid spacing, the minimum it works with is 100x100
These links might help too.

Thank you for your earliest reply.
When i modified the e_we to 109, and the e_sn as same, than ran REAL.exe again, it's failed as follow:
...
INTERMEDIATE domain
ids,ide,jds,jde 40 100 28 88
ims,ime,jms,jme 35 105 23 93
ips,ipe,jps,jpe 38 102 26 90
*************************************
d03 2024-03-28_06:00:00 alloc_space_field: domain 4 , 31173544 bytes allocated

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 40506 RUNNING AT christian-virtual-machine
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 1 PID 40507 RUNNING AT christian-virtual-machine
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
 
Thank you for your earliest reply.
When i modified the e_we to 109, and the e_sn as same, than ran REAL.exe again, it's failed as follow:
...
INTERMEDIATE domain
ids,ide,jds,jde 40 100 28 88
ims,ime,jms,jme 35 105 23 93
ips,ipe,jps,jpe 38 102 26 90
*************************************
d03 2024-03-28_06:00:00 alloc_space_field: domain 4 , 31173544 bytes allocated

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 40506 RUNNING AT christian-virtual-machine
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 1 PID 40507 RUNNING AT christian-virtual-machine
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================

can you upload your latest namelist files, rsl.out and rsl.error files, and where are you getting your source data?
 
Hi, William, thanks for your reply again.
Sorry, i have no rsl.out and rsl.error file because my WRF was configured by option 'serial' not 'dmpar', should i configure WRF again? And all of my source code as well as data were got through the command 'wget', from the official website. This output.log is the error and output information after running wrf.exe, max_dom and the size of each domain was changed as follow:
can you upload your latest namelist files, rsl.out and rsl.error files, and where are you getting your source data?
 

Attachments

  • namelist.input
    4.4 KB · Views: 3
  • output.log
    26.4 KB · Views: 1
  • namelist.wps
    753 bytes · Views: 1
Hi, William, thanks for your reply again.
Sorry, i have no rsl.out and rsl.error file because my WRF was configured by option 'serial' not 'dmpar', should i configure WRF again? And all of my source code as well as data were got through the command 'wget', from the official website. This output.log is the error and output information after running wrf.exe, max_dom and the size of each domain was changed as follow:
e_we = 93, 151,
e_sn = 88, 172,
e_vert = 40, 40,
p_top_requested = 60000,

based on the latest files, i would make 93 and 88 over 100 in size, same in wps namelist. also try changing p top requested

p_top_requested5000pressure top (in Pa) to use in the model; this level must be available in WPS datasingle entry

 
based on the latest files, i would make 93 and 88 over 100 in size, same in wps namelist. also try changing p top requested

p_top_requested5000pressure top (in Pa) to use in the model; this level must be available in WPS datasingle entry
Thanks, William, i can run wrf.exe successfully now, but what i didn't expect was that wrfoutput_d01* was 17 Gb insize per time.
 

Attachments

  • namelist.input
    4.4 KB · Views: 3
Top