Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

MPI_ABORT was invoked on rank 0 in communicator MPI COMMUNICATOR 3 CREATE FROM 0

elad

New member
Hello there!

I'm trying to help a colleague of mine overcome this error, I'd appreciate your input regarding this.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI COMMUNICATOR 3 CREATE FROM 0
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------

Thank you,
E.
 

Attachments

  • namelist.wps
    1.4 KB · Views: 2
  • namelist.input
    4.5 KB · Views: 4
  • rsl.error.txt
    1.3 KB · Views: 2
  • rsl.out.txt
    1.2 KB · Views: 2
Hi,
The issue is likely that not enough processors are being used. Take a look at this FAQ that discusses choosing an appropriate amount, based on the size of the domain.
 
Thank you for your reply,
We are running this job with up to 96 CPUs, should be enough, right?
I noticed it is crashing after "setting clock services according to namelist.rc"
 
Hi,
The estimations in the FAQ are just loose rules and you could sometimes need more, depending on several factors (such as the resolution of your domain, the physics options used, etc.). 96 is probably the absolute lowest number of processors you could use, and it's likely you need more. You could use up to probably closer to 2000, but that shouldn't be necessary either.

I also notice your domain is using 3.33 km grid spacing. What is the resolution of your input data? You typically don't want the ratio of resolution of your input data to be more than about 5:1, so if your input is more coarse than that, you may need to consider putting a larger coarse parent domain around your domain of interest to buffer that difference.
 
Top