Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Issue in creating static file with GWDO

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

EMMANUEL

Member
when I tried running preprocess it stops here without any error for uniform 3km mesh

Assigning remaining dimensions from definitions in Registry.xml ...
THREE = 3
nVertLevels = 1 (config_nvertlevels)
nSoilLevels = 1 (config_nsoillevels)
nFGLevels = 1 (config_nfglevels)
nFGSoilLevels = 1 (config_nfgsoillevels)
nVertLevelsP1 = 2

----- done assigning dimensions from Registry.xml -----


real-data GFS test case ******why there is GFS mentioned here?

Computing GWDO static fields on the native MPAS mesh

--- Using GMTED2010 terrain dataset for GWDO static fields


and in log file it is indicated as
[NID 00697] 2021-08-20 20:00:17 Apid 183188911: OOM killer terminated this process.
[NID 00700] 2021-08-20 20:00:17 Apid 183188911: OOM killer terminated this process.
[NID 00059] 2021-08-20 20:00:17 Apid 183188911: OOM killer terminated this process.
Application 183188911 exit signals: Killed
Application 183188911 resources: utime ~0s, stime ~231s, Rss ~6300, inblocks ~0, outblocks ~0

kindly let me know on this.
 
My guess would be that "OOM" in your log file means "out of memory". The computation of the sub-grid-scale orography fields for the GWDO scheme currently requires each MPI task to read in the global 30-arc-second terrain dataset, which is just under 4 GB. What we typically do when processing GWDO static fields on large meshes is to under-subscribe nodes so that we don't exceed memory capacity on each node. I don't have specific figures available, but a reasonable starting point might be to assume that each MPI task will allocate around 5 or 6 GB, so you can divide the available memory on each node by 5 or 6 GB to get an upper bound on the number of MPI ranks to schedule on each node. If you still see OOM errors, you can increase the node count and proportionately decrease the number of MPI tasks on each node until the job fits in memory.

(For future reference, this thread is a follow-on to this thread about processing static fields on the 3-km quasi-uniform mesh).
 
Sure thank you for the suggestion.
I would like to inform that I have used 'x1.65536002.graph.info.part.8208' for processing GWDO static field.

Can you kindly suggest how to follow up this step on my job submission script

'MPI task will allocate around 5 or 6 GB, so you can divide the available memory on each node by 5 or 6 GB to get an upper bound on the number of MPI ranks to schedule on each node'
 
Questions about how to prepare job submission scripts on particular machines other than NCAR's primary computing system, Cheyenne, are probably outside the scope of the assistance we are able to provide. I'd suggest working with someone at your institution on this. The essential idea is that you'll need to use a smaller number of MPI ranks per node such that, if each MPI rank were to allocate, say, 6 GB of memory, the total available memory on each node would not be exceeded.

For example, on NCAR's Cheyenne system, we have around 45 GB of usable memory on each node, so the GWDO static processing step should use at most 45 GB per node / 6 GB per MPI rank = 7 MPI ranks per node. Again, 6 GB per MPI rank is just a guess, and you may find that you need to further decrease the number of MPI ranks that are scheduled on each node.
 
mgduda said:
Questions about how to prepare job submission scripts on particular machines other than NCAR's primary computing system, Cheyenne, are probably outside the scope of the assistance we are able to provide. I'd suggest working with someone at your institution on this. The essential idea is that you'll need to use a smaller number of MPI ranks per node such that, if each MPI rank were to allocate, say, 6 GB of memory, the total available memory on each node would not be exceeded.

For example, on NCAR's Cheyenne system, we have around 45 GB of usable memory on each node, so the GWDO static processing step should use at most 45 GB per node / 6 GB per MPI rank = 7 MPI ranks per node. Again, 6 GB per MPI rank is just a guess, and you may find that you need to further decrease the number of MPI ranks that are scheduled on each node.

I could produce static with GWDO now to produce init file shall I continue with the same graph info file (x1.65536002.graph.info.part.8208) I used for creating static with gwdo file or continue using 'x1.65536002.graph.info.part.16 or create a new info file with 256 partitions'? Can you kindly suggest me on this. So that I can efficiently use available resources.
 
The remaining preprocessing steps and the model simulation should work with any graph.info.part.XXXX file, and provided you use enough nodes to gain access to sufficient aggregate memory, you should be able to use fully-subscribed nodes.
 
mgduda said:
The remaining preprocessing steps and the model simulation should work with any graph.info.part.XXXX file, and provided you use enough nodes to gain access to sufficient aggregate memory, you should be able to use fully-subscribed nodes.

Thank you for the suggestion
 
Top