Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Error in `./wrf.exe': free(): invalid pointer: 0x000000000bf57790

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

antoniom

New member
Dear all,

I have built a Beowulf cluster to exec WRF 4.0 model.

However, I obtain an error when I exec wrf.exe. It is the next:

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

*** Error in `./wrf.exe': free(): invalid pointer: 0x000000000bf57790 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x81499)[0x7fc88bae0499]
./wrf.exe[0x27a5bae]
./wrf.exe[0x27a7f17]
./wrf.exe[0x27ad60e]
./wrf.exe[0x20e4290]
./wrf.exe[0x1895c75]
./wrf.exe[0x1240991]
./wrf.exe[0x111a216]
./wrf.exe[0x475eb1]
./wrf.exe[0x408114]
./wrf.exe[0x40792d]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7fc88ba81445]
./wrf.exe[0x407964]
======= Memory map: ========
00400000-03163000 r-xp 00000000 00:27 34360439 /home/placa1/wrf/WRF/main/wrf.exe
03362000-03363000 r--p 02d62000 00:27 34360439 /home/placa1/wrf/WRF/main/wrf.exe
03363000-033b5000 rw-p 02d63000 00:27 34360439 /home/placa1/wrf/WRF/main/wrf.exe
033b5000-08f80000 rw-p 00000000 00:00 0
0987a000-0cf5b000 rw-p 00000000 00:00 0 [heap]
7fc870000000-7fc870021000 rw-p 00000000 00:00 0
7fc870021000-7fc874000000 ---p 00000000 00:00 0
7fc877ce8000-7fc87ce6e000 rw-p 00000000 00:00 0
7fc87d868000-7fc886989000 rw-p 00000000 00:00 0
7fc886add000-7fc888005000 rw-p 00000000 00:00 0
7fc88801a000-7fc88ba5f000 rw-p 00000000 00:00 0
7fc88ba5f000-7fc88bc22000 r-xp 00000000 fd:00 14923 /usr/lib64/libc-2.17.so
7fc88bc22000-7fc88be21000 ---p 001c3000 fd:00 14923 /usr/lib64/libc-2.17.so
7fc88be21000-7fc88be25000 r--p 001c2000 fd:00 14923 /usr/lib64/libc-2.17.so
7fc88be25000-7fc88be27000 rw-p 001c6000 fd:00 14923 /usr/lib64/libc-2.17.so
7fc88be27000-7fc88be2c000 rw-p 00000000 00:00 0
7fc88be2c000-7fc88be67000 r-xp 00000000 fd:00 32193 /usr/lib64/libquadmath.so.0.0.0
7fc88be67000-7fc88c066000 ---p 0003b000 fd:00 32193 /usr/lib64/libquadmath.so.0.0.0
7fc88c066000-7fc88c067000 r--p 0003a000 fd:00 32193 /usr/lib64/libquadmath.so.0.0.0
7fc88c067000-7fc88c068000 rw-p 0003b000 fd:00 32193 /usr/lib64/libquadmath.so.0.0.0
7fc88c068000-7fc88c07d000 r-xp 00000000 fd:00 14912 /usr/lib64/libgcc_s-4.8.5-20150702.so.1
7fc88c07d000-7fc88c27c000 ---p 00015000 fd:00 14912 /usr/lib64/libgcc_s-4.8.5-20150702.so.1
7fc88c27c000-7fc88c27d000 r--p 00014000 fd:00 14912 /usr/lib64/libgcc_s-4.8.5-20150702.so.1
7fc88c27d000-7fc88c27e000 rw-p 00015000 fd:00 14912 /usr/lib64/libgcc_s-4.8.5-20150702.so.1
7fc88c27e000-7fc88c37f000 r-xp 00000000 fd:00 14931 /usr/lib64/libm-2.17.so
7fc88c37f000-7fc88c57e000 ---p 00101000 fd:00 14931 /usr/lib64/libm-2.17.so
7fc88c57e000-7fc88c57f000 r--p 00100000 fd:00 14931 /usr/lib64/libm-2.17.so
7fc88c57f000-7fc88c580000 rw-p 00101000 fd:00 14931 /usr/lib64/libm-2.17.so
7fc88c580000-7fc88c69f000 r-xp 00000000 fd:00 28678 /usr/lib64/libgfortran.so.3.0.0
7fc88c69f000-7fc88c89f000 ---p 0011f000 fd:00 28678 /usr/lib64/libgfortran.so.3.0.0
7fc88c89f000-7fc88c8a0000 r--p 0011f000 fd:00 28678 /usr/lib64/libgfortran.so.3.0.0
7fc88c8a0000-7fc88c8a2000 rw-p 00120000 fd:00 28678 /usr/lib64/libgfortran.so.3.0.0
7fc88c8a2000-7fc88c8b9000 r-xp 00000000 fd:00 15000 /usr/lib64/libpthread-2.17.so
7fc88c8b9000-7fc88cab8000 ---p 00017000 fd:00 15000 /usr/lib64/libpthread-2.17.so
7fc88cab8000-7fc88cab9000 r--p 00016000 fd:00 15000 /usr/lib64/libpthread-2.17.so
7fc88cab9000-7fc88caba000 rw-p 00017000 fd:00 15000 /usr/lib64/libpthread-2.17.so
7fc88caba000-7fc88cabe000 rw-p 00000000 00:00 0
7fc88cabe000-7fc88cac5000 r-xp 00000000 fd:00 15004 /usr/lib64/librt-2.17.so
7fc88cac5000-7fc88ccc4000 ---p 00007000 fd:00 15004 /usr/lib64/librt-2.17.so
7fc88ccc4000-7fc88ccc5000 r--p 00006000 fd:00 15004 /usr/lib64/librt-2.17.so
7fc88ccc5000-7fc88ccc6000 rw-p 00007000 fd:00 15004 /usr/lib64/librt-2.17.so
7fc88ccc6000-7fc88cce8000 r-xp 00000000 fd:00 14916 /usr/lib64/ld-2.17.so
7fc88cced000-7fc88cede000 rw-p 00000000 00:00 0
7fc88cee3000-7fc88cee5000 rw-p 00000000 00:00 0
7fc88cee5000-7fc88cee7000 r-xp 00000000 00:00 0 [vdso]
7fc88cee7000-7fc88cee8000 r--p 00021000 fd:00 14916 /usr/lib64/ld-2.17.so
7fc88cee8000-7fc88cee9000 rw-p 00022000 fd:00 14916 /usr/lib64/ld-2.17.so
7fc88cee9000-7fc88ceea000 rw-p 00000000 00:00 0
7ffcd1fd9000-7ffcd1ffb000 rw-p 00000000 00:00 0 [stack]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
#0 0x7FC88C5996F7
#1 0x7FC88C599D3E
#2 0x7FC88BA952EF
#3 0x7FC88BA95277
#4 0x7FC88BA96967
#5 0x7FC88BAD7D36
#6 0x7FC88BAE0498
#7 0x27A5BAD in __module_cu_tiedtke_MOD_cumastr_new
#8 0x27A7F16 in __module_cu_tiedtke_MOD_tiecnv
#9 0x27AD60D in __module_cu_tiedtke_MOD_cu_tiedtke
#10 0x20E428F in __module_cumulus_driver_MOD_cumulus_driver
#11 0x1895C74 in __module_first_rk_step_part1_MOD_first_rk_step_part1
#12 0x1240990 in solve_em_
#13 0x111A215 in solve_interface_
#14 0x475EB0 in __module_integrate_MOD_integrate
#15 0x408113 in __module_wrf_top_MOD_wrf_run

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 6
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:0@localhost.acanmet] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
[proxy:0:0@localhost.acanmet] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0@localhost.acanmet] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event
[mpiexec@localhost.acanmet] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting
[mpiexec@localhost.acanmet] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
[mpiexec@localhost.acanmet] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:217): launcher returned error waiting for completion
[mpiexec@localhost.acanmet] main (./ui/mpich/mpiexec.c:331): process manager error waiting for completion

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


This error only appear when I use e_vert = 23. The error disappear when I use e_vert=33 or e_vert>33 and the execution is successful, but I need to reduce e_vert.

The namelist.input is the next:

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

&time_control
run_days = 0,
run_hours = 48,
run_minutes = 0,
run_seconds = 0,
start_year = 2018, 2018,
start_month = 11, 11,
start_day = 13, 13,
start_hour = 12, 12,
end_year = 2018, 2018,
end_month = 11, 11,
end_day = 15, 15,
end_hour = 12, 12,
interval_seconds = 10800
input_from_file = .true.,.true.,
history_interval = 180, 180,
history_outname = '/home/placa1/disco/Canarias/wrfout_d<domain>_<date>',
frames_per_outfile = 1000, 1000,
restart = .false.,
restart_interval = 7200,
io_form_history = 2
io_form_restart = 2
io_form_input = 2
io_form_boundary = 2
debug_level = 0
/

&domains
time_step = 120,
time_step_fract_num = 0,
time_step_fract_den = 1,
max_dom = 2,
e_we = 80, 82,
e_sn = 70, 52,
e_vert = 23, 23,
p_top_requested = 30000,
num_metgrid_levels = 32,
num_metgrid_soil_levels = 4,
dx = 24000, 8000,
dy = 24000, 8000,
grid_id = 1, 2,
parent_id = 0, 1,
i_parent_start = 1, 27,
j_parent_start = 1, 27,
parent_grid_ratio = 1, 3,
parent_time_step_ratio = 1, 3,
feedback = 1,
smooth_option = 0,
/

&physics
mp_physics = 16, 16,
cu_physics = 6, 6,
ra_lw_physics = 3, 3,
ra_sw_physics = 3, 3,
bl_pbl_physics = 1, 1,
sf_sfclay_physics = 1, 1,
sf_surface_physics = 2, 2,
radt = 30, 30,
bldt = 0, 0,
cudt = 5, 5,
icloud = 1,
num_land_cat = 21,
sf_urban_physics = 0, 0,
/

&fdda
/

&dynamics
hybrid_opt = 2,
w_damping = 0,
diff_opt = 1, 1, 1,
km_opt = 4, 4, 4,
diff_6th_opt = 0, 0, 0,
diff_6th_factor = 0.12, 0.12, 0.12,
base_temp = 290.
damp_opt = 3,
zdamp = 5000., 5000., 5000.,
dampcoef = 0.2, 0.2, 0.2
khdif = 0, 0, 0,
kvdif = 0, 0, 0,
non_hydrostatic = .true., .true., .true.,
moist_adv_opt = 1, 1, 1,
scalar_adv_opt = 1, 1, 1,
gwd_opt = 1,
/

&bdy_control
spec_bdy_width = 5,
specified = .true.
/

&grib2
/

&namelist_quilt
nio_tasks_per_group = 0,
nio_groups = 1,
/
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Could someone help me to solve this problem?

Thanks for the help

Antonio Martin
 
Is there any special reason why you need to reduce the number of vertical levels? WRF requires large-enough vertical levels (e.g., larger than 30) to get up to the model lid while keeping the maximum thickness < 1km.

e_vert = 23 is too small especially for WRFv4 due to the smoother transition of eta levels implemented in this version.

The error message in your email doesn't really help. Please reduce the debug_level in your namelist.input.
 
The error message means corrupted memory. The memory address being deallocated doesn't
point to a valid allocated memory address. Crash messages from this problem can look very ugly
and can be much longer and uglier than the original posting.

Probably cause is an array out of bounds, which means the memory allocation bookkeeping was
scribbled on. It sounds like a consequence of using e_vert that is too small.

Taking a quick glance at the code shows that e_vert=30 doesn't take up a lot of code, so keeping
the default sounds like the perfect solution.
 
Thanks for your messages.

I need reduce the eta levels because the execution time is so long.

A patner used my namelist.input data in a workstation with 32 cores and wrf.exe was executed successful with e_vert=23.

The above error message that I posted was copied of the shell because it did not appear in the rsl.error. However, it appear in the shell and the wrf.exe could not continue the execution.

Thanks for your help.

Antonio.
 
Top