Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

IO quilting on single server

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.


New member
I'm running WRF on a 64 core, dual socket machine with a conventional (non-SSD) hard drive, and I'm seeing my model take a long time to write output steps. Over a 3 hour forecast run, summing up the time spend writing I'm losing almost half an hour.

My configuration is with Intel MPI, using 64 individual processes, with no IO quilting.

I'm quite surprised by this behaviour, because I have no shortage of RAM so I would have thought it would hit the Linux disk cache and get written out asynchronously, but presumably WRF is doing some kind of forced flush/fsync for data integrity(?).

How can I improve upon this? Can I ask WRF to not fsync(?)

Or is it possible to use quilting in this circumstance?
I'm not clear if I can use IO quilting in order to have an extra process to asynchronously write out steps, and whether that would mean I now run with 64 processes + 1 quilt or 63 processes + 1 quilt.
If you have a working example namelist for this please share it.
Should I perhaps move this to "Running WRF"? I wonder if not many people read the High-performance Computing section...
Probably you can try the IO quilting for this case. Below is the general rule for setting the options for quilting:
Suppose the case is run with 384 processors. We can specify the options like below:

nio_tasks_per_group = 5,
nio_groups = 2,

That is: nio_groups x nio_tasks_per_group + nproc_x X nproc_y = 384

Hope the quilting option can be helpful for you to speed up the IO process.