Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Memory Leak in WRF 4.2.1?

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

lnpilz

New member
Hi all,

when I was running long simulations (>O(10h)), I encountered random aborts due to insufficient memory. I was finally able to procure a dedicated compute node with sufficient computation time for this problem to be reproduced and monitored (shoutout to DKRZ Support) and it seems to me that there is a memory leak somewhere in the codebase (cf. mem_leak.png, mem_usage.csv).

mem_leak.png

Before I started debugging, I just wanted to ask whether this is a (known) issue - I couldn't find anything on the forums or in the GH issues. Personally, I can't think of anything that should accumulate in memory over a simulation, but it might also be expected behaviour. Also - if this is a bug - do you have any idea on where to start looking?

Thanks in advance :)
 

Attachments

  • mem_usage.csv
    1.8 MB · Views: 24
  • namelist.input
    2.7 KB · Views: 25
Hi,
It's really difficult to say if this is a known issue, and could be related to many different things. I would first test whether this happens with any domain, with any date/data, and especially whether this happens with the latest version of WRF (V4.3). If you are able to repeat the problem by starting with a restart file just prior to the time you're seeing the issue, you can test with that, instead of having to run the full time-span again; however, you mention it's only when running for a particular amount of time, so perhaps that wouldn't work. It may be worth a try, though, if you haven't already tested that.
 
Hi Kelly,

thanks for reaching out and sorry for the delay. Although I didn't see any patchnotes between 4.2.1 and 4.3 related to memory leaks, I will upgrade to 4.3 and report again. However, I don't really see why changing date and domain should have any impact on memory usage. Memory leaks should not happen with any domain/date/data.

Also, if you have a look at the memory usage graph, this problem is visible from the very beginning of the simulation.

Thanks again and I will report back asap,

Lukas
 
Lukas,
We don't expect there to be memory leaks, but if you can pinpoint the contributing factor that causes the memory leak, it can help to guide us in the right direction in solving the issue. Just as an example, if it only happened when you ran a nested case, it could potentially be related to the feedback option, or related to a certain resolution.
 
Top