WRF with asynchronized I/O

Han994 · Mar 18, 2024

Hi,

I am running a WRF model with horizontal resolution of ~9 km, model grid is 859*859. I want to activate the asynchronized I/O option in my model. Now I am using 256 cores for computing, my nproc_x = 16, proc_y = 16, what is an appropriate option for my nio_groups and nio_tasks_per_group?

I have tried some combinations, for example nio_groups = 1, nio_tasks_per_group = 4, however I got the following message; I also tried nio_groups = 2, nio_tasks_per_group = 16, and got the same message. Is this caused by not enough memory?
===================================================================================
659 = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
660 = RANK 256 PID 457264 RUNNING AT m3ca0705
661 = KILLED BY SIGNAL: 9 (Killed)
662 ===================================================================================
663
664 ===================================================================================
665 = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
666 = RANK 257 PID 457265 RUNNING AT m3ca0705
667 = KILLED BY SIGNAL: 11 (Segmentation fault)
668 ===================================================================================
669
670 ===================================================================================
671 = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
672 = RANK 258 PID 457266 RUNNING AT m3ca0705
673 = KILLED BY SIGNAL: 9 (Killed)
674 ===================================================================================
675
676 ===================================================================================
677 = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
678 = RANK 259 PID 457267 RUNNING AT m3ca0705
679 = KILLED BY SIGNAL: 11 (Segmentation fault)
680 ===================================================================================
681
682 ===================================================================================
683 = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
684 = RANK 260 PID 457268 RUNNING AT m3ca0705
685 = KILLED BY SIGNAL: 11 (Segmentation fault)
686 ===================================================================================
687
688 ===================================================================================
689 = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
690 = RANK 261 PID 457269 RUNNING AT m3ca0705
691 = KILLED BY SIGNAL: 9 (Killed)
692 ===================================================================================

Ming Chen · Mar 18, 2024

Unfortunately the quilt option doesn't work well and we frequently have issues with this option. This issue has been there for a while and our software engineer haven't found a solution yet.
I would suggest that you turn off this option and rerun this case. It is understandable that I/O might be slow for the grid numbers of 859 x 859, but I suppose it should work.

Han994 · Mar 18, 2024

Ming Chen said:
Unfortunately the quilt option doesn't work well and we frequently have issues with this option. This issue has been there for a while and our software engineer haven't found a solution yet.
I would suggest that you turn off this option and rerun this case. It is understandable that I/O might be slow for the grid numbers of 859 x 859, but I suppose it should work.

Thanks for your reply. The model did work with the quilt option turned off. It takes about 3.5 hour for 5-days forecast. I turned it on because we would like to see if the asynchronized I/O could make the simulation faster or not on HPC. I will try a few more times.

francescomaicu · Mar 19, 2024

Hi all,
In my case with a domain 1657*751 the I/O quilting is very beneficial.
If you decompose the domain in 16*16 chunks, you will need to add also the MPI tasks for the I/O, so you need to use 16*16+nio_groups*nio_tasks_per_group.
I decomposed my domain in nx=16 and ny=64 (y-elongated tiling is suggested) and nio_tasks_per_group=4 and nio_groups=1
Good luck

Han994 · Mar 21, 2024

francescomaicu said:
Hi all,
In my case with a domain 1657*751 the I/O quilting is very beneficial.
If you decompose the domain in 16*16 chunks, you will need to add also the MPI tasks for the I/O, so you need to use 16*16+nio_groups*nio_tasks_per_group.
I decomposed my domain in nx=16 and ny=64 (y-elongated tiling is suggested) and nio_tasks_per_group=4 and nio_groups=1
Good luck

Excuse me , when you say your grid is 1657 by 751 and nx = 16, ny =64, I assume you mean x-elongated?

francescomaicu · Mar 21, 2024

Han994 said:
Excuse me , when you say your grid is 1657 by 751 and nx = 16, ny =64, I assume you mean x-elongated?

Hi,
yes, you are right, sorry for that. I found the attached document online which could be useful for you.
Moreover you can take a look to these papers:
1.

Analysis of a New MPI Process Distribution for the Weather Research and Forecasting (WRF) Model

The standard method used in the Weather Research and Forecasting (WRF) model for distributing MPI processes across the processors is not always optimal. This circumstance affects performance, i.e., execution times, but also energy consumption, especially if the application is to be extended to...

doi.org

2.
Balle and Johnsen (2016) : Improving I/O Performance of the Weather Research and Forecast (WRF) Model

WRF with asynchronized I/O

Han994

New member

Ming Chen

Moderator

Han994

New member

francescomaicu

Member

Han994

New member

francescomaicu

Member

Analysis of a New MPI Process Distribution for the Weather Research and Forecasting (WRF) Model