Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

SIGINT error when running WRF

This post was from a previous version of the WRF&MPAS-A Support Forum. New replies have been disabled and if you have follow up questions related to this post, then please start a new thread from the forum home page.

D_Fora

New member
Hi,
I have WRFV4.3 (sm+dm mode) built on the server(server1), 2CPU with 28 cores each. Intel fortran compiler and intel-mpi was installed.

I wanted to run a several-day run. Before running wrf.exe, OMP_NUM_THREADS was set to 4, then ran the command "mpirun -np 24 ./wrf.exe" as root. Everything went well and wrfout files created as I wanted.

Since I have another server(server2) with the same configuration as server1 and both servers are on the same LAN, I want to run the WRF-model using the cpu of both servers. NFS and SSH was prepared to connect these two server.

As I typed "mpirun -np 4 -f hostfile ./hello.exe" as a test, I got messeage below:
# ----------------------------
Hello world: rank 0 of 4 running on Server1
Hello world: rank 1 of 4 running on Server2
Hello world: rank 2 of 4 running on Server1
Hello world: rank 3 of 4 running on Server2
# ----------------------------
where, hostfile contains:
# ----------------------------
Server1
Server2
# ----------------------------

After testing successfully, I reran the same case with the new command "mpirun -np 24 -f hostfile ./wrf.exe" and it crashed quickly. Here are the end of the debug information in rsl.error.0000:
# ----------------------------
forrtl: error (69): process interrupted (SIGINT)
Image PC Routine Line Source
wrf.exe 0000000003236764 for__signal_handl Unknown Unknown
libpthread-2.28.s 0000149EC769C730 Unknown Unknown Unknown
wrf.exe 0000000000AE292C Unknown Unknown Unknown
wrf.exe 000000000041F834 Unknown Unknown Unknown
wrf.exe 0000000000419BE5 Unknown Unknown Unknown
wrf.exe 0000000000580CF0 Unknown Unknown Unknown
wrf.exe 0000000000416AB1 Unknown Unknown Unknown
wrf.exe 0000000000416A64 Unknown Unknown Unknown
wrf.exe 00000000004169E2 Unknown Unknown Unknown
libc-2.28.so 0000149EC710509B __libc_start_main Unknown Unknown
wrf.exe 00000000004168EA Unknown Unknown Unknown
# ------------------------------
View attachment rsl.zip

I can't figure out what problem is. Is there any idea to fix this problem?

Thanks a lot,
huangs
 
Update:
Some other model, such as ROMS, ran successfully using the cpu of both servers. So it probably be something wrong with my installation of WRF-model.
 
Top