Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

MPT Error when running real.exe for a large domain at high resolution

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

JuliaKukulies

New member
Real.exe does not finish when using MPI on Cheyenne (mpiexec_mpt real.exe). The error message I keep getting is:

MPT ERROR: MPI_COMM_WORLD rank 0 has terminated without calling MPI_Finalize()
aborting job MPT: Received signal 11


The real program works fine and created the files wrfinput_d01 etc, for the same domain settings at 4km. So it looks like a memory issue, but I cannot figure out how to solve this. I am running a 1km WRF simulation with quite a large domain (5532 x 3276) and no nesting, see attached namelist. I have already tried different numbers of processors (both less and more) and varying starting times. I also checked the metgrid files which look all fine, and made sure to have enough space on the disk to write the output.

The issue seems similar to https://forum.mmm.ucar.edu/phpBB3/viewtopic.php?t=8496, but I cannot find any CFL error. The rsl files for all processors are created, but no errors are reported (see attached rsl example - they all look the same).

Any help would be appreciated :)
 

Attachments

  • namelist.input
    5.1 KB · Views: 27
  • rsl.error.2879.txt
    867 bytes · Views: 30
For large grid numbers like 5532 x 3276, we often have troubles running both REAL and WRF.
Please try the following options and let me know whether it works for you:
(1) compile WRF in dmpar mode
(2) run real.exe with large numbers of nodes (for example, 36 nodes).

we don't have many experiences running WRF with such a large number of grids. Please keep us updated about the results and progress. Thanks in advance.
 
Dear Ming,

many thanks for the reply! I had already compiled WRF in dmpar mode and used a large number of nodes (even 80). I understand that the domain size is huge, so I will see whether I can try some more things and keep you posted.

/Julia
 
Hi Julia,
I once ran a test case with 2000 x 2000 grid numbers and 1km resolution. I used 25 nodes and the case was done successfully. When using a smaller number of nodes, the case crashed.
Your case is much larger than the one I tested. Please let me know where your case is located, and I will also try it. Thanks.
Ming
 
All files for the case are located in /glade/u/home/kukulies/WRF_simulations/1km_large-domain_data-files using the WRF compilation in /glade/work/kukulies/WRF_4.2.

Many many thanks for your help!
 
Julia,
I am sorry that I cannot make the case run. I cannot figure out the reason yet. We don't have many experiences running WRF over a domain with such a large number of grids. I will talk to our software engineer and keep you updated if I get more information.
 
All right Ming, thank you very much for the update and for your effort of trying! I tried the same domain with 2km grid spacing instead but was not succesfull either.
 
All right Ming, thank you very much for the update and for your effort of trying! I tried the same domain with 2km grid spacing instead but was not succesfull either.
 
Top