daniel_lloveras
New member
Hi there,
I'm running a high-resolution idealized baroclinic wave simulation in WRF 3.6.1 (2400 x 1800 horizontal grid points, 4km resolution) on a HPC machine. When I run wrf.exe with the attached namelist.input file, the model runs fine up until the restart file is set to be written (2.5 days into the simulation). At this point, and before even attempting to write the restart file, the model closes out with the following error, which occurs at about line 18035 of the attached rsl.error file:
0: MPICH2 Error: Failed to register memory address 0x2aabf5ba4000 with length of 0x92000 (598016) bytes.
0: Unable to register memory at the requested address. This may indicate an application bug. See process virtual memory mappings below:
I was wondering if anyone has seen this weird MPI memory address error before, and if so, how I may go about attacking it?
One option, if the problem is that I do not have enough memory to write a single wrfrst file, would be to use "io_form_restart = 102" to make individual restart files for every processor. But, on another run, I wrote output frequently (in contrast to the namelist.input file above, in which I specified that no wrfout files would be made), and the output was placed onto a single, large wrfout file that is larger than what I expect the restart file to be. So, it appears that I have enough memory to write large files, and that this problem is specific to restart files.
I'm going to eventually try the "io_form_restart = 102" namelist option and the joiner script given here: http://www2.mmm.ucar.edu/wrf/users/special_code.html, but I figured it'd be best to ask here for other potential solutions before trying new things like that.
Thanks in advance for the help!
Best,
Daniel
I'm running a high-resolution idealized baroclinic wave simulation in WRF 3.6.1 (2400 x 1800 horizontal grid points, 4km resolution) on a HPC machine. When I run wrf.exe with the attached namelist.input file, the model runs fine up until the restart file is set to be written (2.5 days into the simulation). At this point, and before even attempting to write the restart file, the model closes out with the following error, which occurs at about line 18035 of the attached rsl.error file:
0: MPICH2 Error: Failed to register memory address 0x2aabf5ba4000 with length of 0x92000 (598016) bytes.
0: Unable to register memory at the requested address. This may indicate an application bug. See process virtual memory mappings below:
I was wondering if anyone has seen this weird MPI memory address error before, and if so, how I may go about attacking it?
One option, if the problem is that I do not have enough memory to write a single wrfrst file, would be to use "io_form_restart = 102" to make individual restart files for every processor. But, on another run, I wrote output frequently (in contrast to the namelist.input file above, in which I specified that no wrfout files would be made), and the output was placed onto a single, large wrfout file that is larger than what I expect the restart file to be. So, it appears that I have enough memory to write large files, and that this problem is specific to restart files.
I'm going to eventually try the "io_form_restart = 102" namelist option and the joiner script given here: http://www2.mmm.ucar.edu/wrf/users/special_code.html, but I figured it'd be best to ask here for other potential solutions before trying new things like that.
Thanks in advance for the help!
Best,
Daniel