"Unable to generate processor mesh" error

MattW

Member
Hi WRF Forum folks,

I've encountered a rather strange error in moving some WRF ensemble code from Cheyenne to Derecho here at NCAR. When I try to run an ensemble, a random subset of members fails each time with this error:

taskid: 0 hostname: dec0460
module_io_quilt_old.F 2931 F
MPASPECT: UNABLE TO GENERATE PROCESSOR MESH. STOPPING.
PROCMIN_M 1
PROCMIN_N 1
P -71
MINM 1
MINN -71
-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE: <stdin> LINE: 125
module_dm: mpaspect
-------------------------------------------
Abort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0

I was thinking this might be something to do with my nio_tasks_per_group or nio_groups settings in the namelist.input, but I've tried several different combinations of settings for these (1/4, 4/4, 6/6, 6/12, 12/12) along with several different numbers of processors and nodes and I haven't seen any improvements. The most confusing thing is that the ensemble members that fail each time are different, even when the exact same settings are run twice in a row. Have you encountered anything like this?

Thanks,

Matt Wilson
 
Matt,
I did run a few test cases in derecho. I didn't have the issue you met. Would you please tell me the path where your case is located? Also, which version of WRF are you running? Thanks.
 
@Liuwh @MattW
If you run those failed members separately, can you get them done?
I am suspicious that this is a machine issue instead of something wrong in WRF.
 
@Liuwh @MattW
If you run those failed members separately, can you get them done?
I am suspicious that this is a machine issue instead of something wrong in WRF.
It seems to be a different random subset of members that fails each time, so we can eventually get all of the members by running our script multiple times. I've also got a ticket submitted with CISL asking if this is an issue with something specific to derecho but I haven't gotten an answer from them yet.
 
Thanks for the update, --- this confirms that the issue is a machine issue. Please keep me updated if you get answers from CISL. Thanks in advance.
 
I solved it! Add the following to the end of the namelist.input file in wrf:

&namelist_quilt
nio_tasks_per_group = 0,
nio_groups = 1,
/

That'll do it.
 
Back
Top