Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Segmentation error/Cannot place all ranks on node list

bashaman

New member
Hi,

I have previously successfully run my WRF model with the current settings I have, only with a smaller set of nested grids. In expanding the grids to cover a larger area, I am now running into new errors. I have adhered to "Choosing an Appropriate Number of Processors" as described in FAQ. When I do more cores, I get an error in the wrf.o* file saying "Cannot place all ranks on node list" and when I do fewer cores, I run into a segmentation error.

I have attached files corresponding to the segmentation error occurrence. Please let me know if there's anything else I can attach that would be helpful. Thanks in advance for any help.

- Andrew
 

Attachments

  • new_grid_files.zip
    79.7 KB · Views: 4
Hi Andrew,
I see that you're running this on Derecho, meaning you have a possibility of using 128 processors per node. As a test, can you try to use 1280 processors - so set
Code:
#PBS -l select=10:ncpus=128:mpiprocs=128

and see if that makes any difference. Using this many processors allows the distribution to be an even 32x32 processors, and sometimes have a "more squared" decomposition can help, but not always. If not, do you mind pointing me to the directory where you're running this, so I can take a closer look?
 
Hi, I realized that I had max_dom = 2 instead of max_dom =3 in my namelist.input. It is good to know that Derecho has 128 ppn.

Now I am running into a new error saying "real_em.F: Could not find the parent domain", which appears to be an unresolved issue being discussed in another post (real_em.F: Could not find the parent domain). Any insights into that problem would be helpful.

Thanks!
 
It's an interesting issue. If you're okay sharing the path(s) to your running directories on derecho, I can try to run my own test, using your namelist and files to try to recreate the error - that would help me to be able to troubleshoot easier.
 
Here are my running directories:
/glade/work/bashaman/ms_thesis/wrfv4.6.0/wpsv4.6.0
/glade/work/bashaman/ms_thesis/wrfv4.6.0/run
 
Thanks for sharing those. I believe the issue is that you are missing the parameter "grid_id" in your namelist.input file. Add this to the &domains section and try to run real again:

Code:
grid_id = 1, 2, 3
 
Top