Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

WRF with asynchronized I/O

Han994

New member
Hi,

I am running a WRF model with horizontal resolution of ~9 km, model grid is 859*859. I want to activate the asynchronized I/O option in my model. Now I am using 256 cores for computing, my nproc_x = 16, proc_y = 16, what is an appropriate option for my nio_groups and nio_tasks_per_group?

I have tried some combinations, for example nio_groups = 1, nio_tasks_per_group = 4, however I got the following message; I also tried nio_groups = 2, nio_tasks_per_group = 16, and got the same message. Is this caused by not enough memory?
===================================================================================
659 = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
660 = RANK 256 PID 457264 RUNNING AT m3ca0705
661 = KILLED BY SIGNAL: 9 (Killed)
662 ===================================================================================
663
664 ===================================================================================
665 = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
666 = RANK 257 PID 457265 RUNNING AT m3ca0705
667 = KILLED BY SIGNAL: 11 (Segmentation fault)
668 ===================================================================================
669
670 ===================================================================================
671 = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
672 = RANK 258 PID 457266 RUNNING AT m3ca0705
673 = KILLED BY SIGNAL: 9 (Killed)
674 ===================================================================================
675
676 ===================================================================================
677 = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
678 = RANK 259 PID 457267 RUNNING AT m3ca0705
679 = KILLED BY SIGNAL: 11 (Segmentation fault)
680 ===================================================================================
681
682 ===================================================================================
683 = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
684 = RANK 260 PID 457268 RUNNING AT m3ca0705
685 = KILLED BY SIGNAL: 11 (Segmentation fault)
686 ===================================================================================
687
688 ===================================================================================
689 = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
690 = RANK 261 PID 457269 RUNNING AT m3ca0705
691 = KILLED BY SIGNAL: 9 (Killed)
692 ===================================================================================
 
Unfortunately the quilt option doesn't work well and we frequently have issues with this option. This issue has been there for a while and our software engineer haven't found a solution yet.
I would suggest that you turn off this option and rerun this case. It is understandable that I/O might be slow for the grid numbers of 859 x 859, but I suppose it should work.
 
Unfortunately the quilt option doesn't work well and we frequently have issues with this option. This issue has been there for a while and our software engineer haven't found a solution yet.
I would suggest that you turn off this option and rerun this case. It is understandable that I/O might be slow for the grid numbers of 859 x 859, but I suppose it should work.
Thanks for your reply. The model did work with the quilt option turned off. It takes about 3.5 hour for 5-days forecast. I turned it on because we would like to see if the asynchronized I/O could make the simulation faster or not on HPC. I will try a few more times.
 
Hi all,
In my case with a domain 1657*751 the I/O quilting is very beneficial.
If you decompose the domain in 16*16 chunks, you will need to add also the MPI tasks for the I/O, so you need to use 16*16+nio_groups*nio_tasks_per_group.
I decomposed my domain in nx=16 and ny=64 (y-elongated tiling is suggested) and nio_tasks_per_group=4 and nio_groups=1
Good luck
 
Hi all,
In my case with a domain 1657*751 the I/O quilting is very beneficial.
If you decompose the domain in 16*16 chunks, you will need to add also the MPI tasks for the I/O, so you need to use 16*16+nio_groups*nio_tasks_per_group.
I decomposed my domain in nx=16 and ny=64 (y-elongated tiling is suggested) and nio_tasks_per_group=4 and nio_groups=1
Good luck
Excuse me , when you say your grid is 1657 by 751 and nx = 16, ny =64, I assume you mean x-elongated?
 
Excuse me , when you say your grid is 1657 by 751 and nx = 16, ny =64, I assume you mean x-elongated?
Hi,
yes, you are right, sorry for that. I found the attached document online which could be useful for you.
Moreover you can take a look to these papers:
1.
2.
Balle and Johnsen (2016) : Improving I/O Performance of the Weather Research and Forecast (WRF) Model
 
Last edited:
Top