Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

rsl.error.0008:Fatal error in MPI_Wait: Other MPI error, error stack:

Brian LuValle

New member
Hello,
I am running WRF on an m1 Mac studio. I will run wry.exe, and the program will produce output. However, it will stop after a certain amount of time producing the following error. I have attached my namelist.wps and namelist.input below.
Thank you,
Brian LuValle
 

Attachments

  • namelist.wps
    733 bytes · Views: 1
  • namelist.input
    3.7 KB · Views: 1
Hello,
I am running WRF on an m1 Mac studio. I will run wry.exe, and the program will produce output. However, it will stop after a certain amount of time producing the following error. I have attached my namelist.wps and namelist.input below.
Thank you,
Brian LuValle
Good morning Brian,

Can you also attach your rsl.error.log and rsl.out.log?
 
FYI I reran the program and this time it is giving me rsl.error.0001:Fatal error in MPI_Wait: Other MPI error, error stack:
I have attached all of the rsl error and rsl out files below
 

Attachments

  • rsl_errors.TAR
    8.4 KB · Views: 2
  • rsl_out.TAR
    7.6 KB · Views: 1
FYI I reran the program and this time it is giving me rsl.error.0001:Fatal error in MPI_Wait: Other MPI error, error stack:
I have attached all of the rsl error and rsl out files below
Thank you for these files, I'll review them or one of the admins will look at them too.

Just for some information can you run this command

Bash:
env | sort

in your terminal and put the output here in the thread?
 
env | sort
CC=gcc

CONDA_DEFAULT_ENV=base

CONDA_EXE=/Users/brianluvalle/miniconda3/bin/conda

CONDA_PREFIX=/Users/brianluvalle/miniconda3

CONDA_PROMPT_MODIFIER=(base)

CONDA_PYTHON_EXE=/Users/brianluvalle/miniconda3/bin/python

CONDA_SHLVL=1

CPPFLAGS=-I/USERS/brianluvalle/Build_WRF/LIBRARIES/grib2/include

CXX=g++

DIR=/USERS/brianluvalle/Build_WRF/LIBRARIES

DISPLAY=/private/tmp/com.apple.launchd.epzbufp6Ag/org.xquartz:0

F77=gfortran

FC=gfortran

FCFLAGS=-m64

FFLAGS=-m64 -fallow-argument-mismatch -O2

HOME=/Users/brianluvalle

HOMEBREW_CELLAR=/opt/homebrew/Cellar

HOMEBREW_PREFIX=/opt/homebrew

HOMEBREW_REPOSITORY=/opt/homebrew

INFOPATH=/opt/homebrew/share/info:

LANG=en_US.UTF-8

LDFLAGS=-L/USERS/brianluvalle/Build_WRF/LIBRARIES/grib2/lib

LOGNAME=brianluvalle

LaunchInstanceID=62A70083-A178-40D9-933D-9AC47E27F5D2

MANPATH=/opt/homebrew/share/man::

NETCDF=/USERS/brianluvalle/Build_WRF/LIBRARIES/netcdf

OLDPWD=/Users/brianluvalle/BUILD_WRF

PATH=/USERS/brianluvalle/Build_WRF/LIBRARIES/mpich/bin:/Users/brianluvalle/miniconda3/bin:/Users/brianluvalle/miniconda3/condabin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/opt/X11/bin:/Library/Apple/usr/bin

PWD=/Users/brianluvalle/BUILD_WRF/WRF/RUN

SECURITYSESSIONID=186a5

SHELL=/bin/zsh

SHLVL=1

SSH_AUTH_SOCK=/private/tmp/com.apple.launchd.LscDLiOgMU/Listeners

TERM=xterm-256color

TERM_PROGRAM=Apple_Terminal

TERM_PROGRAM_VERSION=445

TERM_SESSION_ID=69D10870-37D6-4F24-9A5B-F7A34E8841BA

TMPDIR=/var/folders/6d/vrq8z3mx43qc45sfmgs6vqbm0000gn/T/

USER=brianluvalle

XPC_FLAGS=0x0

XPC_SERVICE_NAME=0

_=/usr/bin/env

_CE_CONDA=

_CE_M=

__CFBundleIdentifier=com.apple.Terminal

netcdfclassic=1
 
CC=gcc

CONDA_DEFAULT_ENV=base

CONDA_EXE=/Users/brianluvalle/miniconda3/bin/conda

CONDA_PREFIX=/Users/brianluvalle/miniconda3

CONDA_PROMPT_MODIFIER=(base)

CONDA_PYTHON_EXE=/Users/brianluvalle/miniconda3/bin/python

CONDA_SHLVL=1

CPPFLAGS=-I/USERS/brianluvalle/Build_WRF/LIBRARIES/grib2/include

CXX=g++

DIR=/USERS/brianluvalle/Build_WRF/LIBRARIES

DISPLAY=/private/tmp/com.apple.launchd.epzbufp6Ag/org.xquartz:0

F77=gfortran

FC=gfortran

FCFLAGS=-m64

FFLAGS=-m64 -fallow-argument-mismatch -O2

HOME=/Users/brianluvalle

HOMEBREW_CELLAR=/opt/homebrew/Cellar

HOMEBREW_PREFIX=/opt/homebrew

HOMEBREW_REPOSITORY=/opt/homebrew

INFOPATH=/opt/homebrew/share/info:

LANG=en_US.UTF-8

LDFLAGS=-L/USERS/brianluvalle/Build_WRF/LIBRARIES/grib2/lib

LOGNAME=brianluvalle

LaunchInstanceID=62A70083-A178-40D9-933D-9AC47E27F5D2

MANPATH=/opt/homebrew/share/man::

NETCDF=/USERS/brianluvalle/Build_WRF/LIBRARIES/netcdf

OLDPWD=/Users/brianluvalle/BUILD_WRF

PATH=/USERS/brianluvalle/Build_WRF/LIBRARIES/mpich/bin:/Users/brianluvalle/miniconda3/bin:/Users/brianluvalle/miniconda3/condabin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/opt/X11/bin:/Library/Apple/usr/bin

PWD=/Users/brianluvalle/BUILD_WRF/WRF/RUN

SECURITYSESSIONID=186a5

SHELL=/bin/zsh

SHLVL=1

SSH_AUTH_SOCK=/private/tmp/com.apple.launchd.LscDLiOgMU/Listeners

TERM=xterm-256color

TERM_PROGRAM=Apple_Terminal

TERM_PROGRAM_VERSION=445

TERM_SESSION_ID=69D10870-37D6-4F24-9A5B-F7A34E8841BA

TMPDIR=/var/folders/6d/vrq8z3mx43qc45sfmgs6vqbm0000gn/T/

USER=brianluvalle

XPC_FLAGS=0x0

XPC_SERVICE_NAME=0

_=/usr/bin/env

_CE_CONDA=

_CE_M=

__CFBundleIdentifier=com.apple.Terminal

netcdfclassic=1
Thanks for the information.

Did you have Conda installed prior to installing WRF or did you install conda afterwords?
 
I installed WRF First, then installed Conda, but I have rebuilt WRF since installing Conda
Okay so the issue might be that Conda is in your path and the Conda environment has a mpich library installed.

Two things to try.

1. Activate your conda environment (conda activate [name of environment] ) and look for a mpich library with the command
Bash:
which mpirun
If it shows up that's the most likely culprit

2. Try running WRF and WPS with the absolute path for mpich.

I'm assuming you are doing something like this

Bash:
mpirun -np (some number) ./real.exe
mpirun -np (some number) ./wrf.exe

so if that's the case try running it like this

Bash:
path/to/mpirun -np (some number) ./real.exe

Let me know if that fixes the error
 
Conda returns the WRF MPIRUN directory:: /USERS/brianluvalle/Build_WRF/LIBRARIES/mpich/bin/mpirun
and I ran
/USERS/brianluvalle/Build_WRF/LIBRARIES/mpich/bin/mpirun -np 1 ./real.exe
and
/USERS/brianluvalle/Build_WRF/LIBRARIES/mpich/bin/mpirun -np 16 ./wrf.exe

only to get a similar error rsl.error.0009:Fatal error in MPI_Wait: Other MPI error, error stack:
 
Conda returns the WRF MPIRUN directory:: /USERS/brianluvalle/Build_WRF/LIBRARIES/mpich/bin/mpirun
and I ran
/USERS/brianluvalle/Build_WRF/LIBRARIES/mpich/bin/mpirun -np 1 ./real.exe
and
/USERS/brianluvalle/Build_WRF/LIBRARIES/mpich/bin/mpirun -np 16 ./wrf.exe

only to get a similar error rsl.error.0009:Fatal error in MPI_Wait: Other MPI error, error stack:
okay try running the wrf outside of the conda environment.

and use the absolute path to mpich again

The issue is that the computer is seeing two mpich libraries installed on the system and it doesn't know which one to use. When inside the Conda environment it is trying to use both even if you put the absolute path.
 
I deactivated the conda environment and ran wrf.exe again and after producing 3 generations of output I received a similar error: rsl.error.0010:Fatal error in MPI_Wait: Other MPI error, error stack:
 
I deactivated the conda environment and ran wrf.exe again and after producing 3 generations of output I received a similar error: rsl.error.0010:Fatal error in MPI_Wait: Other MPI error, error stack:
One last thing to try and then we will have to have the admins @kwerner & @Ming Chen look into this.
Try running ./real.exe and ./wrf.exe with -np2

Sometimes if to many process threads are used it can bust wrf.
 
That worked thank you, Im wondering though is there any workarounds for using more processors?
I believe somewhere in the faq section at the top of the homepage there is an article about how to calculate the max number of processors allowed for a run.
 
That worked thank you, Im wondering though is there any workarounds for using more processors?
 
Top