Hello forum,
I have a problem when running WRF version 4.6.1 CHEM on Perlmutter machine. I contacted NERSC support and they are helping us. But I would appreciate if anyone here has any helpful solution or experience to share.
ERROR in <rsl.error.0000>:
MPICH ERROR [Rank 0] [job id 36469101.0] [Mon Mar 3 23:58:18 2025] [nid004172] - Abort(1) (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
sbatch script i used:
#SBATCH -N 2
srun -n 64 -c 8 --cpu_bind=cores ${wrfexe}
export OMP_NUM_THREADS= 4
#SBATCH -N 1
srun -n 64 -c 4 --cpu_bind=cores ${wrfexe}
export OMP_NUM_THREADS= 2
None of them worked.
ERROR in <slurm.out>:
libgomp: Invalid value for environment variable OMP_NUM_THREADS:
We did several tests on other machines with the same setting and they all worked properly. We also tested wrf without chem and it works well. It works properly with chem_opt=195, but not with 198 and 203 and I need to run with chem_opt 203. I was wondering what could make this error.
I also attached my namelist and rsl.error files.
Thanks for your help.
I have a problem when running WRF version 4.6.1 CHEM on Perlmutter machine. I contacted NERSC support and they are helping us. But I would appreciate if anyone here has any helpful solution or experience to share.
ERROR in <rsl.error.0000>:
MPICH ERROR [Rank 0] [job id 36469101.0] [Mon Mar 3 23:58:18 2025] [nid004172] - Abort(1) (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
sbatch script i used:
#SBATCH -N 2
srun -n 64 -c 8 --cpu_bind=cores ${wrfexe}
export OMP_NUM_THREADS= 4
#SBATCH -N 1
srun -n 64 -c 4 --cpu_bind=cores ${wrfexe}
export OMP_NUM_THREADS= 2
None of them worked.
ERROR in <slurm.out>:
libgomp: Invalid value for environment variable OMP_NUM_THREADS:
We did several tests on other machines with the same setting and they all worked properly. We also tested wrf without chem and it works well. It works properly with chem_opt=195, but not with 198 and 203 and I need to run with chem_opt 203. I was wondering what could make this error.
I also attached my namelist and rsl.error files.
Thanks for your help.