Hi,
I'm running WRF on a 64 core, dual socket machine with a conventional (non-SSD) hard drive, and I'm seeing my model take a long time to write output steps. Over a 3 hour forecast run, summing up the time spend writing I'm losing almost half an hour.
My configuration is with Intel MPI, using 64 individual processes, with no IO quilting.
I'm quite surprised by this behaviour, because I have no shortage of RAM so I would have thought it would hit the Linux disk cache and get written out asynchronously, but presumably WRF is doing some kind of forced flush/fsync for data integrity(?).
How can I improve upon this? Can I ask WRF to not fsync(?)
Or is it possible to use quilting in this circumstance?
I'm not clear if I can use IO quilting in order to have an extra process to asynchronously write out steps, and whether that would mean I now run with 64 processes + 1 quilt or 63 processes + 1 quilt.
If you have a working example namelist for this please share it.
I'm running WRF on a 64 core, dual socket machine with a conventional (non-SSD) hard drive, and I'm seeing my model take a long time to write output steps. Over a 3 hour forecast run, summing up the time spend writing I'm losing almost half an hour.
My configuration is with Intel MPI, using 64 individual processes, with no IO quilting.
I'm quite surprised by this behaviour, because I have no shortage of RAM so I would have thought it would hit the Linux disk cache and get written out asynchronously, but presumably WRF is doing some kind of forced flush/fsync for data integrity(?).
How can I improve upon this? Can I ask WRF to not fsync(?)
Or is it possible to use quilting in this circumstance?
I'm not clear if I can use IO quilting in order to have an extra process to asynchronously write out steps, and whether that would mean I now run with 64 processes + 1 quilt or 63 processes + 1 quilt.
If you have a working example namelist for this please share it.