Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

real.exe': nested run - malloc(): smallbin double linked list corrupted:

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

gossaral

New member
Hi,
I am running WRF4.1.1 on a CRAY supercomputer.

I am trying to run a simulation with two domains (nested run with feedback) over Antarctica. I can successfully create all the pre-processing files (geogrid and metgrid) and I am able to run domain 1 and domain 2 individually (not nested). However, when I try to launch the nested simulation, real.exe crashes at reading input from domain 2 with the error message:

Assume Noah LSM input
d02 2016-01-01_00:00:00 forcing artificial silty clay loam at 360 points, out of 360
d02 2016-01-01_00:00:00 Timing for processing 0 s.
*** Error in `/scale_wlg_nobackup/filesets/nobackup/vuw03030/WRF/run_nest_test/./real.exe': malloc(): smallbin double linked list corrupted: 0x0000000009e70a60 ***
======= Backtrace: =========
[0x36144ab]
[0x36196a2]

I have tried different processors configurations, but it always crashes at the same point. I have attached my namelist, the logfiles and examples of my met_em* files.

Would you have any idea of the cause of the crash?

Many thanks!
 
Hi Gossaral,
I am not familiar with CRAY and don't know for sure the reason why the case crashed. I will talk to our software engineer and see whether they have any idea. At the same time, if you have more information (e.g., error message, segmentation fault, missing data, etc.), please let us know.
Ming
 
Would you check to see if this works without a nest? Just turn the max_dom = 1 in the namelist. This will check to see if there is not enough memory.
 
Hi davegill and Ming,

Yes, I have tried bot the domain 1 and the domain 2 individually and they ran fine.
I am also in touch with our software engineers at the supercomputer and they have tried a few things. They have told me that

" These are the errors that causes the problem with real.exe:
7 6:50.896 5-6,9-10,12-46,... (47 total)
Memory error detected in module_soil_pre::adjust_for_seaice_pre (module_soil_pre.f90:175):
read/write beyond end of allocation
8 6:50.974 0-4,7-8,11,47,51,... (29 total)
Memory error detected in module_soil_pre::adjust_for_seaice_post (module_soil_pre.f90:333):
read/write beyond end of allocation
"

Is there a dummy nested domains setup for beginners that I could try (easy and not too memory-intensive)? Just to test if my domain is the issue, or the coupling on the cluster...?

Thanks!
 
Also, would you attach the file share/module_soil_pre.f90 (not the *.F file, the *.f90 file). I want to see what lines 175 and 333 are doing.
 
Hi Dave,
I had replied before but there seems to be an issue (timeout with the web page):
I have tried decomposing across more nodes (and our supercomputer software engineers have also tried a few configurations, with no success).

I should add that I am running Polar-WRF (including the changes to adapt WRF to Polar environments) and it affected module_soil_pre.f90

Many thanks,

ALex
 

Attachments

  • module_soil_pre.zip
    37.4 KB · Views: 35
Hi again Dave,

I have tried the case from the online tutorial (https://www2.mmm.ucar.edu/wrf/OnLineTutorial/CASES/NestRuns/geogrid_2way2inputs.php) on both the super computer and my local machine (80-cores desktop) and they both fail to run it at real.exe.

The error message is
free(): invalid pointer (local machine, using Gfortran)

*** Error in `/scale_wlg_nobackup/filesets/nobackup/vuw03030/WRF/run_nest_test/./real.exe': corrupted double-linked list: 0x0000000009655e90 *** (supercomputer, using Cray).

I have tried with both the default WRF4.1-1 and the polar version of it, and none of them were running. I have also tried by reducing the domains to 51*51 and 22*22.

in summary
- it does not seem to be linked to a specific platform
- it does not seem to be linked to WRF/Polar-WRF
- it does not seem to be linked to a specific domain (geographic location or grid size)

Finally, I would be happy to try but.. what is vanilla WRF?

Many thanks,

Alex
 
Hi Dave,

Apologies, I have just figured out what "vanilla WRF" means :)
I have re-downloaded WRF4-1-1 from scratch and tried with that clean version. And I get the same error message.

Thanks!

Alex
 
Alex,
Please also run WPSV4.1 to produce met_em files. You can download GFS data from https://rda.ucar.edu/datasets/ds084.1/, and ungrib the files using Vtable.GFS.
I am concerned that the data in WRF tutorial webpage may be old. It is always good to prepare your own met_em data.
Ming
 
Hi, I managed to get the simulation to run by changing
input_from_file = .true.,.true.
to
input_from_file = .true.,.false.

Could you please explain the difference between the two? I had met_em files for both domains, and was planning to run a two-nested domain.

Thanks!

Alex
 
Hello i need help am a new user of wrf and i encounted the this error while running my real.exe
Domain # 1: dx = 24000.000 m
Domain # 2: dx = 8000.000 m
REAL_EM V4.2.2 PREPROCESSOR
*************************************
Parent domain
ids,ide,jds,jde 1 380 1 390
ims,ime,jms,jme -4 385 -4 395
ips,ipe,jps,jpe 1 380 1 390
*************************************
DYNAMICS OPTION: Eulerian Mass Coordinate
 
Alex,
" input_from_file = True, True," indicates that REAL will produce wrfinput_d01 and wrfinpout_d02 for the nested case.
" input_from_file = True, False," indicates that REAL will only produce wrfinput_d01. wrfinpout_d02 will be derived from wrfinput_d01 once the model starts running.
The former option is better than the latter because wrfinput_d02 will include finer surface information (terrain height, landuse, etc.). When using the latter option, wrfinput_d02 is produced by interpolation from wrfinput_d01. Those detailed high-resolution information for D02 will be lost.
 
i have attatched the screen shot of the error i get
taskid: 0 hostname: geog-HP
module_io_quilt_old.F 2931 T
Ntasks in X 1 , ntasks in Y 1
Domain # 1: dx = 24000.000 m
Domain # 2: dx = 8000.000 m
REAL_EM V4.2.2 PREPROCESSOR
*************************************
Parent domain
ids,ide,jds,jde 1 380 1 390
ims,ime,jms,jme -4 385 -4 395
ips,ipe,jps,jpe 1 380 1 390
*************************************
DYNAMICS OPTION: Eulerian Mass Coordinate
 

Attachments

  • Screenshot from 2021-07-14 14-04-08.png
    Screenshot from 2021-07-14 14-04-08.png
    278.3 KB · Views: 1,728
  • namelist.input
    4 KB · Views: 32
  • namelist.wps
    785 bytes · Views: 35
Hi again,

You are right, and since I use Polar-WRF in polar stereographic over the pole, WRF does not let me use nests with no input_from_file = .true.,.false.

This means that I am back to my initial error message
*** Error in `/real.exe': malloc(): smallbin double linked list corrupted: 0x0000000049c01520 ***
And I still do not know how to solve it.

Thanks!
 
Hi,
I looked at your namelist.wps and nameless.input. There is only one issue I am concerned, that you set stand_lon different to ref_lon. It is recommended to set them to be the same. However, I don't think this is the reason that REAL cannot run.
Besides, your post indicated that you run with polar-stereographic map projection. But your namelist.wps shows mercator projection. Please confirm which one you used.
The screenshot doesn't show any error message. Can you examine all your RSL files? The error message could be in any of them and doesn't necessarily be in rsl.out.0000 and rsl.error.0000.
 
Hi Ming,

I think there is some sort of confusion. Someone else ( odongoronieplanks) started posting on my initial post and seems to have another problem. I do use polar stereographic and not mercator and I posted my initial error message.

Thanks!
 
Top