IO quilting on single server

plantain · Sep 24, 2019

Hi,
I'm running WRF on a 64 core, dual socket machine with a conventional (non-SSD) hard drive, and I'm seeing my model take a long time to write output steps. Over a 3 hour forecast run, summing up the time spend writing I'm losing almost half an hour.

My configuration is with Intel MPI, using 64 individual processes, with no IO quilting.

I'm quite surprised by this behaviour, because I have no shortage of RAM so I would have thought it would hit the Linux disk cache and get written out asynchronously, but presumably WRF is doing some kind of forced flush/fsync for data integrity(?).

How can I improve upon this? Can I ask WRF to not fsync(?)

Or is it possible to use quilting in this circumstance?
I'm not clear if I can use IO quilting in order to have an extra process to asynchronously write out steps, and whether that would mean I now run with 64 processes + 1 quilt or 63 processes + 1 quilt.
If you have a working example namelist for this please share it.

plantain · Sep 27, 2019

Should I perhaps move this to "Running WRF"? I wonder if not many people read the High-performance Computing section...

Ming Chen · Oct 2, 2019

Probably you can try the IO quilting for this case. Below is the general rule for setting the options for quilting:
Suppose the case is run with 384 processors. We can specify the options like below:

nproc_x=11
nproc_y=34
nio_tasks_per_group = 5,
nio_groups = 2,

That is: nio_groups x nio_tasks_per_group + nproc_x X nproc_y = 384

Hope the quilting option can be helpful for you to speed up the IO process.

IO quilting on single server

plantain

New member

plantain

New member

Ming Chen

Moderator