Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

wrf.exe is getting the error YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9) after running for a few seconds

zoidimitriadou

New member
Hello!
I'm writing this question after @kwerner prompted me to open a new thread for my issue (here). I'm new at this forum but have some experience with the model which was always supervised up until now. Currently, I am trying to run a simulation in order to later use WRFDA. All the executables have run successfully but for wrf.exe. When I'm trying to run that using the command mpirun -np 32 ./wrf.exe > wrf_run.log 2>&1 & , I get the error of my title. I have tried changing my timestep as was mentioned in the link above. I also have tried various ways of setting my vertical levels (you can see some of them commented out in my namelist.input). All of them lead to the same error, while 1 to 5 of my rsl.error.* files include the following error :
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:

My system is Linux Ubuntu server (x86_64 GNU/Linux), CPU(s): 72, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz. With calculations following appropriate number of processors , I have found out that 32 is an appropriate amount of processors for my case and also metgrid and real run successfully with "mpirun -np 32 ./ ". I only get the above issue with wrf.exe.
I'm running the latest published version of both WRF and WPS (4.5) with input and boundary data from GFS with a resolution of 0.25 (NCEP Data Products GFS and GDAS) and also SST_FIXED data by the same source (NCEP Data Products SST).
Some help would be greatly appreciated with my issue since I cannot find what's causing it :)
Kindly,
Zoi Dimitriadou
 

Attachments

  • error_files_and_namelists.tar
    1 MB · Views: 7
Thanks for taking the time to post the new thread.
1) Since the model stops immediately, it could be related to the input. Take a look at your input (met_em) files and see if you notice anything odd - look at all levels.
2) Although your calculation of the number of processors is not wrong, you are probably using near to the bare minimum. You could likely use many additional processors with your case - up to about 500, which may help, but that may also not be the issue. Typically the wps programs and real.exe are able to run with much fewer processors than wrf.exe.
3) I would recommend setting debug_levels = 0. While this parameter seems like it would be useful, it rarely provides any useful information and just makes the rsl files large and difficult to read through.
4) You could try testing just domain 01 to see if that runs, and then add domain 02 if that works. This can help to narrow down which domain is causing the issue.
5) Another test is to try using a default namelist, but with your domain/date/time settings to see if the issue could potentially be with any of the namelist parameters you've added/changed.
 
Hello again!
Thank you for your answer! I didn't want to reply to this thread until I checked all the options that you mentioned above. I did check them all, but sadly I still get the error. Furthermore, I checked my input data and everything seems to be correct and nothing looks out of place (which I think would be weird to happen since I get my data from GFS). The max number of processors I can use for my system is 72, but trying that also doesn't seem to solve my issue. I did do the debug_level=0. I also tried running with only one domain and nothing seems to change, except it takes less time for the model to crash (fewer seconds). Lastly, I took your advice and used the default namelist options (getting my namelist.input from here). I rerun both WPS and WRF, but I still get this error

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 538542 RUNNING AT gourounitsa
= EXIT CODE: 9
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.

Please see the FAQ page for debugging suggestions

Any more recommendations on my issue?
I'm also attaching my namelists and rsl* files
Thank you so much for your time,
Zoi
 

Attachments

  • error_files_2nd_try.tar
    1.6 MB · Views: 1
Thanks for trying those things. Can you possibly send me the namelist.input, namelist.wps, and rsl* files for your test with the default namelist and only a single domain simulation? Thanks!
 
Thank you once more for your response! But after several trials, I figured out it wasn't an issue with my namelists, but rather with the initial setup of the libraries in my system. So I started from the beginning and set up everything again and now everything is working fine for the time being!
 
Top