wrf.exe not running on cluster & segmentation fault

husile

New member
I am having a same problem with the wrf.exe does not run on a cluster .

I compiled on intel-oneapi-compilers/2021.4.0 and openmpi/4.1.1. with configure option 15 & 1. It would come up with a segmentation fault or won't run on cluster. It depends on the number of nodes and task I've set for the slurs job.

I've attached namelist and pasted the .csh job command here. There is no rsl. file even it still running, also the slurm.out is empty.

#SBATCH --time=48:00:00 # walltime, abbreviated by -t
#SBATCH --nodes=4 # number of cluster nodes, abbreviated by -N
#SBATCH -o slurm-%j.out-%N
#SBATCH -e slurm-%j.err-%N
#SBATCH -J test_wrf
#SBATCH --ntasks=124 # number of MPI tasks, abbreviated by -A

unlimit stacksize
mpirun -np $SLURM_NTASKS ./real.exe
mpirun -np $SLURM_NTASKS ./wrf.exe
 

Attachments

Are you ever able to get an rsl* file? You mentioned that it sometimes seg-faults. Do you get the error files in that case? If so, can you package those up and send them? Otherwise, this may be something you need to discuss with a systems administrator at your institution, since it's possibly related to the cluster.
 
AH ~ yes, I have a rsl.error file created now.

The rsl*.txt is the previous segmentation fault. But it seems, not showing up in the new run(rsl.error.0000).

Thank you very much for the quick response!
 

Attachments

Thanks for sending those. I notice in the slurm script you pasted above, it looks like you're asking for 124 processors, but in the rsl* files you sent me, you're only asking for 4. Do you get the same result when asking for multiple nodes (and more processors), or is that the case where you don't get any rsl files?
What happens if you just run 2 domains, or one?
 
Back
Top