Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

WRF4.1.5, em_les, wrf.exe hangs

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

xyli00

Member
Dear everyone,

I tried to a run a test of WRF4.1.5, em_les on a new cluster
and found that the execution of wrf.exe hangs.
Any idea about this?
On my another machine, the test run finishes in 1 minute.

Here is the Intel compiler I am using,
module load intel/19.0.3
module load mvapich2/2.3.2
module load netcdf/4.4.1.1
export NETCDF=/share/apps/netcdf/4.4.1.1/intel/19.0.3/

$ ./configure
select "16" and "1"
$ ./compile em_les > log
$ sbatch job.sh

The content of job.sh is
#!/bin/bash -l

#SBATCH -A activate
#SBATCH -J wrf
#SBATCH --nodes=1
#SBATCH -n 8
#SBATCH -t 00:59:58
#SBATCH -o wrf.out
#SBATCH -e wrf.err

ulimit -s unlimited
srun ./wrf.exe

Best regards,

Xiang-Yu
 
Hi, Xiang,
Did you run ideal.exe first? I think so, but just want to make sure. Is there any error message in your rsl files?
The compiler and libs look fine.
If the same code can run in one machine but hang in another, it often indicates something wrong either in the library or in the environmental settings. Please consult your computer manager. It is hard to figure out the reason if we cannot repeat the error.
 
Hi Ming,

Thanks a lot for your reply.
I contacted my computer supporting center but I have not heard back from them.

Yes, I did. Attached please find the error report.
The simulation had hanged for about an hour before it was killed.
That is why you will see the error in the report that process is killed.

Best,

Xiangyu
 

Attachments

  • rsl.txt
    97.5 KB · Views: 67
Xiangyu,
I believe this is a computer issue. It is possibly related to your MPI installation or the command to run MPI job. The comminution between your processors may also be an issue. Please talk to your computer managers to seek a solution.
 
Hi Xiang-Yu,

Since this is a post in June, maybe you have successfully solved this problem. If not, have you tried to increase the walltime of MPI run? The system will kill the task when the run time exceeds the pre-set walltime.
 
Top