Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

wrf.exe not running on cluster & segmentation fault

husile

New member
I am having a same problem with the wrf.exe does not run on a cluster .

I compiled on intel-oneapi-compilers/2021.4.0 and openmpi/4.1.1. with configure option 15 & 1. It would come up with a segmentation fault or won't run on cluster. It depends on the number of nodes and task I've set for the slurs job.

I've attached namelist and pasted the .csh job command here. There is no rsl. file even it still running, also the slurm.out is empty.

#SBATCH --time=48:00:00 # walltime, abbreviated by -t
#SBATCH --nodes=4 # number of cluster nodes, abbreviated by -N
#SBATCH -o slurm-%j.out-%N
#SBATCH -e slurm-%j.err-%N
#SBATCH -J test_wrf
#SBATCH --ntasks=124 # number of MPI tasks, abbreviated by -A

unlimit stacksize
mpirun -np $SLURM_NTASKS ./real.exe
mpirun -np $SLURM_NTASKS ./wrf.exe
 

Attachments

  • namelist.input
    4 KB · Views: 2
  • namelist.wps
    832 bytes · Views: 2
Are you ever able to get an rsl* file? You mentioned that it sometimes seg-faults. Do you get the error files in that case? If so, can you package those up and send them? Otherwise, this may be something you need to discuss with a systems administrator at your institution, since it's possibly related to the cluster.
 
AH ~ yes, I have a rsl.error file created now.

The rsl*.txt is the previous segmentation fault. But it seems, not showing up in the new run(rsl.error.0000).

Thank you very much for the quick response!
 

Attachments

  • rsl.error.0000
    128.2 KB · Views: 3
  • rsl.error.0000.txt
    13.9 KB · Views: 3
Thanks for sending those. I notice in the slurm script you pasted above, it looks like you're asking for 124 processors, but in the rsl* files you sent me, you're only asking for 4. Do you get the same result when asking for multiple nodes (and more processors), or is that the case where you don't get any rsl files?
What happens if you just run 2 domains, or one?
 
Top