'BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES' during wrf.exe

kinguT

Member
I create real.exe files for the simulation period of start_date = '2012-06-27_00:00:00', end_date = '2012-09-30_18:00:00' to four domains successfully. However, when I start 'mpirun -np 8 ./wrf.exe' the following error message appeared:


BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 84908 RUNNING AT negusu-OptiPlex-3060
= EXIT CODE: 9
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions



Does anyone suggest me how to solve?

Thanks
 

Attachments

Last edited:
I have the exact same issue for a few days now and I've searched for it in the forum in multiple ways but have not found a solution. All others excecutables have run successfully in my system using 32 cores. My system is Linux Ubuntu server (x86_64 GNU/Linux), CPU(s): 72, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz. With calculations following appropriate number of processors , I have found out that 32 is an appropriate amount of processors for my case and also metgrid and real run successfully with "mpirun -np 32 ./ ". I only get the above issue with wrf.exe. I have installed the latest version of the model available 4.5 for both WRF and WPS and use input and boundary data from GFS with SST_FIXED also from GFS. I also used the domain wizard web to create my domains for wps.
I've also noticed this
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

in 4 of my rsl.error.00* files (rsl.error.0007,rsl.error.0008,rsl.error.0016 and rsl.error.0023)

I have been running version 4.2 of the model before but I'm new at trying to run version 4.5 without help so some advice will be greatly appreciated.
Best,
Zoi
 

Attachments

Hi,

I'm encountering the same issues and am currently working on optimizing my time step. I recommend decreasing your time step to see if that resolves the problem. This adjustment worked for me. Like try 90 of 108 instead ? Same for kingu, try maybe less than 162 with 135 ? Here I was making 5*dx.

Also don't hesitate when fine tunning the time step for each domain to use the time_step_ratio.

Best,

Vazquez Ballesta Manuarii
 
Hi,

I'm encountering the same issues and am currently working on optimizing my time step. I recommend decreasing your time step to see if that resolves the problem. This adjustment worked for me. Like try 90 of 108 instead ? Same for kingu, try maybe less than 162 with 135 ? Here I was making 5*dx.

Also don't hesitate when fine tunning the time step for each domain to use the time_step_ratio.

Best,

Vazquez Ballesta Manuarii
Thank you so much for your recommendation,
I have changed the timestep to 90 and the same error occurs, with the only difference being that the error that I mentioned above is now shown only in 3 rsl.error.* files. I will try to run with different timesteps to see if that will maybe fix it as you have suggested.
Kindly,
Zoi
 
Ok, if you want, as an example for my configuration, I found that time step of 10s (with a small different time step ratio because the inner domain under 1km require smaller time step) help and I have the following in namelist :

&domains
time_step = 10,
time_step_fract_num = 0,
time_step_fract_den = 1,
max_dom = 5,
e_we = 106, 100, 100, 175, 169,
e_sn = 100, 100, 121, 178, 154,
e_vert = 46, 46, 46, 46, 46,
vert_refine_method = 0, 0, 0, 0, 0,

eta_levels(1:46) = 1.0000, 0.9987, 0.9974, 0.9962, 0.9949,
0.9924, 0.9899, 0.9859, 0.9809, 0.9759,
0.9709, 0.9659, 0.9606, 0.9520, 0.9427,
0.9326, 0.9219, 0.9077, 0.8932, 0.8769,
0.8656, 0.8574, 0.8462, 0.8351, 0.8235,
0.8113, 0.7958, 0.7756, 0.7494, 0.7133,
0.6742, 0.6323, 0.5876, 0.5406, 0.4915,
0.4409, 0.3895, 0.3379, 0.2871, 0.2378,
0.1907, 0.1465, 0.1056, 0.0682, 0.0332,
0.0000,


p_top_requested = 5000,
num_metgrid_levels = 38,
num_metgrid_soil_levels = 4,
dx = 9000, 3000, 1000, 333.333, 111.111,
dy = 9000, 3000, 1000, 333.333, 111.111,
grid_id = 1, 2, 3, 4, 5,
parent_id = 0, 1, 2, 3, 4,
i_parent_start = 1, 50, 30, 11, 75,
j_parent_start = 1, 35, 30, 27, 47,
parent_grid_ratio = 1, 3, 3, 3, 3,
parent_time_step_ratio = 1, 3, 3, 4, 3,
feedback = 1,
smooth_option = 0,
 
Ok, if you want, as an example for my configuration, I found that time step of 10s (with a small different time step ratio because the inner domain under 1km require smaller time step) help and I have the following in namelist :

&domains
time_step = 10,
time_step_fract_num = 0,
time_step_fract_den = 1,
max_dom = 5,
e_we = 106, 100, 100, 175, 169,
e_sn = 100, 100, 121, 178, 154,
e_vert = 46, 46, 46, 46, 46,
vert_refine_method = 0, 0, 0, 0, 0,

eta_levels(1:46) = 1.0000, 0.9987, 0.9974, 0.9962, 0.9949,
0.9924, 0.9899, 0.9859, 0.9809, 0.9759,
0.9709, 0.9659, 0.9606, 0.9520, 0.9427,
0.9326, 0.9219, 0.9077, 0.8932, 0.8769,
0.8656, 0.8574, 0.8462, 0.8351, 0.8235,
0.8113, 0.7958, 0.7756, 0.7494, 0.7133,
0.6742, 0.6323, 0.5876, 0.5406, 0.4915,
0.4409, 0.3895, 0.3379, 0.2871, 0.2378,
0.1907, 0.1465, 0.1056, 0.0682, 0.0332,
0.0000,


p_top_requested = 5000,
num_metgrid_levels = 38,
num_metgrid_soil_levels = 4,
dx = 9000, 3000, 1000, 333.333, 111.111,
dy = 9000, 3000, 1000, 333.333, 111.111,
grid_id = 1, 2, 3, 4, 5,
parent_id = 0, 1, 2, 3, 4,
i_parent_start = 1, 50, 30, 11, 75,
j_parent_start = 1, 35, 30, 27, 47,
parent_grid_ratio = 1, 3, 3, 3, 3,
parent_time_step_ratio = 1, 3, 3, 4, 3,
feedback = 1,
smooth_option = 0,
I have tried a timestep of 108, 90, 72, 54, 36, 18 and even 10s and I keep getting the same error. So unfortunately I don't think it is a timestep issue.
 
Hi @zoidimitriadou and @Manuarii
The error message "BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES" just simply means the simulation failed for some reason. And even if the rsl files don't seem to reveal any specific errors, it's likely your reasons for failures are all very different; therefore it would be best if you each post a new thread to discuss your issues if you are still experiencing them. Please make sure to include your namelist.input file and all of your rsl files in that post, as well. Thank you and I apologize for the inconvenience.
 
Hi,

I'm encountering the same issues and am currently working on optimizing my time step. I

Manuarii

decreasing your time step to see if that resolves the problem. This adjustment worked for me. Like try 90 of 108 instead ? Same for kingu, try maybe less than 162 with 135 ? Here I was making 5*dx.

Also don't hesitate when fine tunning the time step for each domain to use the time_step_ratio.

Best,

Vazquez Ballesta Manuarii
Thanks Manuarii for your recommendation.
However, I proved that reducing the time_step can't resolve my problem.
 
Unfortunately there isn't an alternative to that needing additional processors. In the rsl* files you sent at the beginning, the model seems to stop almost immediately, so I don't see where it ran for 6 hours. Even so, sometimes this can still be a lack of processors. Since your d01 is smaller (than d03), you could try running a single domain simulation to see if that fails. Although 8 processors is very few, I think it would still be able to process d01's size. If that works, you could then try d02, and then d03, until you find which domain causes the failure. You could also try using smaller domains for all 4 domains to see if you're able to run that. If so, it may point even more to the fact that it's an issue with the number of processors.

Another thing I notice is that your d01 is using a resolution of 27km, which is probably too coarse, depending on the resolution of the input data you're using. What is the resolution of your input data?
 
Unfortunately there isn't an alternative to that needing additional processors. In the rsl* files you sent at the beginning, the model seems to stop almost immediately, so I don't see where it ran for 6 hours. Even so, sometimes this can still be a lack of processors. Since your d01 is smaller (than d03), you could try running a single domain simulation to see if that fails. Although 8 processors is very few, I think it would still be able to process d01's size. If that works, you could then try d02, and then d03, until you find which domain causes the failure. You could also try using smaller domains for all 4 domains to see if you're able to run that. If so, it may point even more to the fact that it's an issue with the number of processors.

Another thing I notice is that your d01 is using a resolution of 27km, which is probably too coarse, depending on the resolution of the input data you're using. What is the resolution of your input data?
Thank you kwerner.
I use the NCEP Final Analysis (GFS-FNL) with 1-degree spatial resolution.
 
Thanks. I suppose then it makes sense to use a 27km parent domain; however, you may want to consider using a higher-resolution input (e.g., GFS 0.25 degree data). It may not make much of a difference, but we typically advise to use the highest resolution option available.
 
Thanks. I suppose then it makes sense to use a 27km parent domain; however, you may want to consider using a higher-resolution input (e.g., GFS 0.25 degree data). It may not make much of a difference, but we typically advise to use the highest resolution option available.
Thanks kewerner.
 
I am trying to run the coupled WRF/WRF-Hydro model with the Crocus option enabled. i have completed upto ./real.exe process and generated threre three files wrfinput_d02,wrfinput_d01, wrfbdy_d01. But now I am encountering a segmentation fault (core dumped) error while executing ./wrf.exe. I have generated all the required WRF-Hydro input files using the WRF-Hydro GIS Preprocessor with geo_em.d02.nc. The files I createdinclude: fulldom_hires.nc, GEOGRID_LDASOUT_Spatial_Metadata.nc, GWBASINS.nc, GWBUCKPARM.nc, hydro2dtbl.nc, Route_Link.nc,soil_properties.nc
I placed all these files in the WRF run directory with the Domain folder. However, when I execute mpirun -np 1 ./wrf.exe i got the error
(ncl_env) sagar@sagar-OptiPlex-Tower-Plus-7010:~/data/coupled_wrfhydro_Copy/WRF/run$ mpirun -np 1 ./wrf.exe
starting wrf task 0 of 1

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 18119 RUNNING AT sagar-OptiPlex-Tower-Plus-7010
= EXIT CODE: 139
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions
I have attached the error log files along with my namelist.input and hydro.namlist.
Could you please help me identify what might be causing this issue and what should be the namelist.input and hydro.namelist for two domains? I would greatly appreciate your guidance.

Regards
Sagar Lamichhane

I create real.exe files for the simulation period of start_date = '2012-06-27_00:00:00', end_date = '2012-09-30_18:00:00' to four domains successfully. However, when I start 'mpirun -np 8 ./wrf.exe' the following error message appeared:


BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 84908 RUNNING AT negusu-OptiPlex-3060
= EXIT CODE: 9
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions



Does anyone suggest me how to solve?

Thanks
 

Attachments

I am trying to run the coupled WRF/WRF-Hydro model with the Crocus option enabled. i have completed upto ./real.exe process and generated threre three files wrfinput_d02,wrfinput_d01, wrfbdy_d01. But now I am encountering a segmentation fault (core dumped) error while executing ./wrf.exe. I have generated all the required WRF-Hydro input files using the WRF-Hydro GIS Preprocessor with geo_em.d02.nc. The files I createdinclude: fulldom_hires.nc, GEOGRID_LDASOUT_Spatial_Metadata.nc, GWBASINS.nc, GWBUCKPARM.nc, hydro2dtbl.nc, Route_Link.nc,soil_properties.nc
I placed all these files in the WRF run directory with the Domain folder. However, when I execute mpirun -np 1 ./wrf.exe i got the error
(ncl_env) sagar@sagar-OptiPlex-Tower-Plus-7010:~/data/coupled_wrfhydro_Copy/WRF/run$ mpirun -np 1 ./wrf.exe
starting wrf task 0 of 1

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 18119 RUNNING AT sagar-OptiPlex-Tower-Plus-7010
= EXIT CODE: 139
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions
I have attached the error log files along with my namelist.input and hydro.namlist.
Could you all members please help me identify what might be causing this issue and what should be the namelist.input and hydro.namelist for two domains? I would greatly appreciate your guidance.

Regards,
Sagar Lamichhane
 

Attachments

I am trying to run the coupled WRF/WRF-Hydro model with the Crocus option enabled. i have completed upto ./real.exe process and generated threre three files wrfinput_d02,wrfinput_d01, wrfbdy_d01. But now I am encountering a segmentation fault (core dumped) error while executing ./wrf.exe. I have generated all the required WRF-Hydro input files using the WRF-Hydro GIS Preprocessor with geo_em.d02.nc. The files I createdinclude: fulldom_hires.nc, GEOGRID_LDASOUT_Spatial_Metadata.nc, GWBASINS.nc, GWBUCKPARM.nc, hydro2dtbl.nc, Route_Link.nc,soil_properties.nc
I placed all these files in the WRF run directory with the Domain folder. However, when I execute mpirun -np 1 ./wrf.exe i got the error
(ncl_env) sagar@sagar-OptiPlex-Tower-Plus-7010:~/data/coupled_wrfhydro_Copy/WRF/run$ mpirun -np 1 ./wrf.exe
starting wrf task 0 of 1

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 18119 RUNNING AT sagar-OptiPlex-Tower-Plus-7010
= EXIT CODE: 139
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions
I have attached the error log files along with my namelist.input and hydro.namlist.
Could you all members please help me identify what might be causing this issue and what should be the namelist.input and hydro.namelist for two domains? I would greatly appreciate your guidance.

Regards,
Sagar Lamichhane
Probably this is happening because you are running with only a single processor (mpirun -np 1).
To make sure your WRF-Hydro input files are correct, I suggest running WRF-Hydro in standalone mode first before moving to the fully coupled WRF/WRF-Hydro run. That way you can confirm the hydro domain and routing inputs work properly and isolate whether the issue is coming from coupling.


Also, to get more feedback from people who work specifically on WRF-Hydro, I recommend posting this on the WRF-Hydro user group as well: https://groups.google.com/a/ucar.edu/g/wrf-hydro_users
 
Back
Top