wrf.exe error. signal 6 (Aborted)

llsalazardom · Oct 5, 2023

Good afternoon.

We are trying to run a nested simulation for four domains with ndown in HPC, for far we have successfully run d01,d02 and d03 but we fail to run d04.

We have this error when we try to run d04:
mpirun noticed that process rank 1326 with PID 1300686 on node cn0265 exited on signal 6 (Aborted).

We don't know what is it happening?
Please find attached the namelist* and rsl.error.000.

Thanks for your help!

kwerner · Oct 11, 2023

Hi,
My guess is that you may be using too many processors. Take a look at Choosing an Appropriate Number of Processors to determine if that may be the case. I also recommend setting debug_level to 0. That option rarely provides useful information and just makes the rsl files very large and difficult to read.

Try running with fewer processors and setting debug_level =0 and if it still fails, please attach the new namelist.input file, and packaged all of the rsl.error.* files into a single *.tar file and attach that, as well. If that is too large to attach, see the home page of this forum for instructions on sharing large files. Thanks!

llsalazardom · Oct 19, 2023

Hello @kwerner

Thanks for your answer, we had a problem with the spacing because the rsl was too big, we decided to delete most of the rsl to only leave the rsl.error.0000 and the rsl.out.0000 and then the model ran successfully..

When we decided to delete the rsl's we wondered if this would hinder the run in any way, because we deleted them during the run.

Do you think that deleting these logs can bring any problem to our simulation?

kwerner · Oct 19, 2023

Deleting those files mid-run should not cause any issues, but in the future, if you set debug_level = 0, the rsl files will not be anywhere close to as large as they were when you had it set to 9999. That should solve the issue of those files being too large. It sounds like the issue was probably a disk space problem, so by freeing up a large chunk of space, the model was able to complete and write the full output files. I'm glad to hear you got past the issue!

wrf.exe error. signal 6 (Aborted)

llsalazardom

Member

Attachments

kwerner

Administrator

llsalazardom

Member

kwerner

Administrator