Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

"Unable to generate processor mesh" error

MattW

New member
Hi WRF Forum folks,

I've encountered a rather strange error in moving some WRF ensemble code from Cheyenne to Derecho here at NCAR. When I try to run an ensemble, a random subset of members fails each time with this error:

taskid: 0 hostname: dec0460
module_io_quilt_old.F 2931 F
MPASPECT: UNABLE TO GENERATE PROCESSOR MESH. STOPPING.
PROCMIN_M 1
PROCMIN_N 1
P -71
MINM 1
MINN -71
-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE: <stdin> LINE: 125
module_dm: mpaspect
-------------------------------------------
Abort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0

I was thinking this might be something to do with my nio_tasks_per_group or nio_groups settings in the namelist.input, but I've tried several different combinations of settings for these (1/4, 4/4, 6/6, 6/12, 12/12) along with several different numbers of processors and nodes and I haven't seen any improvements. The most confusing thing is that the ensemble members that fail each time are different, even when the exact same settings are run twice in a row. Have you encountered anything like this?

Thanks,

Matt Wilson
 
Matt,
I did run a few test cases in derecho. I didn't have the issue you met. Would you please tell me the path where your case is located? Also, which version of WRF are you running? Thanks.
 
I'm having the same problem as you and it still hasn't been solved, may I ask how you solved it?
 
@Liuwh @MattW
If you run those failed members separately, can you get them done?
I am suspicious that this is a machine issue instead of something wrong in WRF.
 
@Liuwh @MattW
If you run those failed members separately, can you get them done?
I am suspicious that this is a machine issue instead of something wrong in WRF.
It seems to be a different random subset of members that fails each time, so we can eventually get all of the members by running our script multiple times. I've also got a ticket submitted with CISL asking if this is an issue with something specific to derecho but I haven't gotten an answer from them yet.
 
Thanks for the update, --- this confirms that the issue is a machine issue. Please keep me updated if you get answers from CISL. Thanks in advance.
 
Top