Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Segmentation fault (signal 11)

Hi,

I tried to run WRF with three domains and GFS0p25 for initialization. The resolutions are 9 km, 3 km, and 1 km, as shown in the namelist.input file attached. The namelist.wps, domain figure, and all rsl.error and rsl.out files are also attached.

To clarify, the innermost domain focuses on an inlet that covers complex topography. I attempted to run WRF with the following command:

$ mpirun -np 8 ./wrf.exe

However, it encountered the following error.

"""
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 3068012 RUNNING AT klinaklini.unbc.ca
= EXIT CODE: 139
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

"""

I suspect this might be related to the high resolution of the innermost domain and its complex topography. Ideally, we need to run WRF at a higher resolution, like 300 m, but we are currently stuck at 1 km. I am unsure how to resolve this issue.

Also, I found the following CFL error:

"""
rsl.error.0005:d03 2019-02-01_12:00:47+02/09 278 points exceeded v_cfl = 2 in domain d03 at time 2019-02-01_12:00:47+02/09 hours
rsl.error.0005:d03 2019-02-01_12:00:47+02/09 Max W: 331 201 3 W: ******* w-cfl: 75.86 dETA: 0.01
rsl.out.0005:d03 2019-02-01_12:00:47+02/09 278 points exceeded v_cfl = 2 in domain d03 at time 2019-02-01_12:00:47+02/09 hours
rsl.out.0005:d03 2019-02-01_12:00:47+02/09 Max W: 331 201 3 W: ******* w-cfl: 75.86 dETA: 0.01

"""

I checked a thread that mentioned a similar issue (Segmentation fault (signal 11)). Based on the discussion in that thread, I tried changing the time step from 30 to 5 seconds, but the problem wasn't resolved.
I don't think changing only the time step will solve this issue. For a resolution of 1 km, even with a 5-second time step, the issue persists. So, what about higher resolutions, like 300 m?

As this task is essential for my thesis, it might require some back-and-forth communication to solve it. I appreciate your patience in advance and look forward to your assistance.

Sincerely,
Ehsan
 

Attachments

  • WRFForum241028.zip
    538.6 KB · Views: 5
In error file #5


Code:
d03 2019-02-01_12:00:47+02/09          278  points exceeded v_cfl = 2 in domain d03 at time 2019-02-01_12:00:47+02/09 hours
d03 2019-02-01_12:00:47+02/09 Max   W:    331    201      3 W: *******  w-cfl:   75.86  dETA:    0.01

So here's a few reccomendations.

1. w_damping = 0, ---> w_damping = 1,
2. epssm = 0.9, 0.9, 0.9 ! time off-centering for vertical sound waves (in dynamics section, i used this over nepal and the Himalayas)
3. etac = 0.02 (in dynamics section, i used this over nepal and the Himalayas)


Try #1 first with the original time step then if that doesn't work try #1-3 with orginal time step
 
Hi,

I tried to run WRF with three domains and GFS0p25 for initialization. The resolutions are 9 km, 3 km, and 1 km, as shown in the namelist.input file attached. The namelist.wps, domain figure, and all rsl.error and rsl.out files are also attached.

To clarify, the innermost domain focuses on an inlet that covers complex topography. I attempted to run WRF with the following command:

$ mpirun -np 8 ./wrf.exe

However, it encountered the following error.

"""
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 3068012 RUNNING AT klinaklini.unbc.ca
= EXIT CODE: 139
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

"""

I suspect this might be related to the high resolution of the innermost domain and its complex topography. Ideally, we need to run WRF at a higher resolution, like 300 m, but we are currently stuck at 1 km. I am unsure how to resolve this issue.

Also, I found the following CFL error:

"""
rsl.error.0005:d03 2019-02-01_12:00:47+02/09 278 points exceeded v_cfl = 2 in domain d03 at time 2019-02-01_12:00:47+02/09 hours
rsl.error.0005:d03 2019-02-01_12:00:47+02/09 Max W: 331 201 3 W: ******* w-cfl: 75.86 dETA: 0.01
rsl.out.0005:d03 2019-02-01_12:00:47+02/09 278 points exceeded v_cfl = 2 in domain d03 at time 2019-02-01_12:00:47+02/09 hours
rsl.out.0005:d03 2019-02-01_12:00:47+02/09 Max W: 331 201 3 W: ******* w-cfl: 75.86 dETA: 0.01

"""

I checked a thread that mentioned a similar issue (Segmentation fault (signal 11)). Based on the discussion in that thread, I tried changing the time step from 30 to 5 seconds, but the problem wasn't resolved.
I don't think changing only the time step will solve this issue. For a resolution of 1 km, even with a 5-second time step, the issue persists. So, what about higher resolutions, like 300 m?

As this task is essential for my thesis, it might require some back-and-forth communication to solve it. I appreciate your patience in advance and look forward to your assistance.

Sincerely,
Ehsan


see also this post

 
Thank you, Dear Will.

The following changes helped, and the model ran successfully.

set w_damping = 1,
set epssm = 0.5

However, it seems that setting epssm = 0.5 works well, and there is no need to set w_damping = 1.

As stated in the WRF User Guide, the value should not exceed 0.5, so I set it to 0.5.

I have a few queries and would appreciate some assistance:

1. In the link you provided (Segmentation Faults - Helpful Information), various suggestions are mentioned, including reducing time_step, adding smooth_cg_topo = .true., setting epssm = 0.2 (up to 0.9), and setting w_damping = 1. My question is, which one has priority and affects the results the least? For example, if setting epssm = 0.5 works, should I also change w_damping = 1? Or, if setting w_damping = 1 allows me to run with epssm = 0.3, is that better, or should I only set epssm = 0.5 with w_damping = 0? Considering the complexity of my domain, should I add smooth_cg_topo = .true. even though the model runs with the changes mentioned above? I should mention that my domain is quite complex, with slopes greater than 70 degrees.

2. The link you provided mentions setting setenv MP_STACK_SIZE 64000000 and using the command ulimit -s unlimited. I also heard about the following commands for dm+sm:

export KMP_STACKSIZE=500000000
export OMP_NUM_THREADS=32


I would like to know if I need to use the command ulimit -s unlimited every time I run ./wrf.exe, as well as the other mentioned environment settings. Obviously, other environment settings could be placed in the .bashrc file. Should I do that to set them permanently?

3. The link mentions the number of processors. I would like help determining the optimum (or minimum and maximum) number of processors I could use. In the previously attached file, there are namelist.wps and namelist.input settings with e_we = 214, 391, 586 and e_sn = 138, 226, 334. Also, the outputs of the commands lscpu and cat /proc/cpuinfo are attached here. I tried the formula in the link ((e_we)/25) * ((e_sn)/25) and ((e_we)/100) * ((e_sn)/100), and I also tried the Python script from the link, but the results were different.

Apologies for the lengthy queries. I have some confusion and hope to get some assistance.

Sincerely,
Ehsan
 

Attachments

  • cpuinfo.txt
    50 KB · Views: 0
  • lscpu.txt
    2.9 KB · Views: 0
1. In the link you provided (Segmentation Faults - Helpful Information), various suggestions are mentioned, including reducing time_step, adding smooth_cg_topo = .true., setting epssm = 0.2 (up to 0.9), and setting w_damping = 1. My question is, which one has priority and affects the results the least? For example, if setting epssm = 0.5 works, should I also change w_damping = 1? Or, if setting w_damping = 1 allows me to run with epssm = 0.3, is that better, or should I only set epssm = 0.5 with w_damping = 0? Considering the complexity of my domain, should I add smooth_cg_topo = .true. even though the model runs with the changes mentioned above? I should mention that my domain is quite complex, with slopes greater than 70 degrees.
I'm not really sure which one affects the results to most, ill let @kwerner answer that one because they know the settings better than I do. You could always play around with all the different configurations and see which match the observations best?
I would like to know if I need to use the command ulimit -s unlimited every time I run ./wrf.exe, as well as the other mentioned environment settings. Obviously, other environment settings could be placed in the .bashrc file. Should I do that to set them permanently?

I personally don't like putting things into .bashrc because it will affect everything globally. While that particular command shouldn't affect other programs in general unless you are certain that putting the exports into bashrc won't affect other programs it's just best to export when you need it.



The link mentions the number of processors. I would like help determining the optimum (or minimum and maximum) number of processors I could use. In the previously attached file, there are namelist.wps and namelist.input settings with e_we = 214, 391, 586 and e_sn = 138, 226, 334. Also, the outputs of the commands lscpu and cat /proc/cpuinfo are attached here. I tried the formula in the link ((e_we)/25) * ((e_sn)/25) and ((e_we)/100) * ((e_sn)/100), and I also tried the Python script from the link, but the results were different.

That formula works well with HPC computers which have hundreds of cores available. For example my domain I normally run reccomends 64 cores, but my machine physically doesn't have that. So I run with what I have available. For my desktop I use only 50% of the cores I have available. That's because I have noticed that my desktop runs slower and I can't multitask on it if I am using all 100% of the cores.


Hope that answers your questions.
 
Thank you Dear Will,
Your replies are very helpful.

I have another query that I would appreciate some help with:
4. Considering ( dx = 9000, 3000, 1000 ), could I set time_step = 60, which is slightly more than 6*dx? Do you have any suggestions for the best choice of time_step?

Also, any other comments about all four queries are very welcome.

Sincerely,
Ehsan
 
Thank you Dear Will,
Your replies are very helpful.

I have another query that I would appreciate some help with:
4. Considering ( dx = 9000, 3000, 1000 ), could I set time_step = 60, which is slightly more than 6*dx? Do you have any suggestions for the best choice of time_step?

Also, any other comments about all four queries are very welcome.

Sincerely,
Ehsan
so the highest you could go is 54 since 9*6 is 54, but I would go 45 so that it fits into hourly steps easier
 
Hi,
I also have similar problem. Although I tried to solve CFL error with
1.w_damping = 1,
2. epssm = 0.8
3. smooth_cg_topo = .true
4.ulimit -s unlimited But, none of them did not solve the problem.
I would like to know whether CFL error depend on the simulation domain set up or system error. How should i distinguish the cause of error? When the grid size decrease, does the CFL error occur? In my simulation, I set up the domain from 1km to 37m.
Here I attached the namelist file and error.
Any advice or comments are appreciated.
Thank you in advance for your time.
 

Attachments

  • namelist.input
    11.8 KB · Views: 4
  • namelist.wps
    920 bytes · Views: 1
  • rsl.error.0000
    434.8 KB · Views: 1
  • rsl.out.0000
    434.2 KB · Views: 1
Hi,
I also have similar problem. Although I tried to solve CFL error with
1.w_damping = 1,
2. epssm = 0.8
3. smooth_cg_topo = .true
4.ulimit -s unlimited But, none of them did not solve the problem.
I would like to know whether CFL error depend on the simulation domain set up or system error. How should i distinguish the cause of error? When the grid size decrease, does the CFL error occur? In my simulation, I set up the domain from 1km to 37m.
Here I attached the namelist file and error.
Any advice or comments are appreciated.
Thank you in advance for your time.
Since this issue is on a different system can you open a new thread in the WRF section of the forum. The admins like to keep posts separate
 
I just want to check: since the resolution for the 2nd and 3rd domains is 3 km and 1 km respectively, should I turn off the cumulus scheme?

To clarify, as mentioned above, the model runs well, but I want to be sure if turning off the cumulus scheme for these resolutions is better or makes no difference.

Thanks in advance.
 
I just want to check: since the resolution for the 2nd and 3rd domains is 3 km and 1 km respectively, should I turn off the cumulus scheme?

To clarify, as mentioned above, the model runs well, but I want to be sure if turning off the cumulus scheme for these resolutions is better or makes no difference.

Thanks in advance.
Yes, please turn off cumulus scheme for D02 and D03.
make sure your time step is equal to or smaller than 6 * DX
 
Dear Ming,

I appreciate your helpful reply.

Regarding the time step, since the inner domain has a resolution of 1 km, do you mean I should choose a time step of 6 seconds or less?

What about higher resolutions? For example, if the innermost domain has a resolution of 333 meters or 111 meters, could you suggest the best option for the time step?

Additionally, I would appreciate your input on the w_damping parameter. I plan to run WRF in real mode (not ideal) for simulating past events (not operational forecasting). Should I set w_damping to 0 or 1?

Thanks in advance.
 
Please see my answers below:
Dear Ming,

I appreciate your helpful reply.

Regarding the time step, since the inner domain has a resolution of 1 km, do you mean I should choose a time step of 6 seconds or less?

time_step is always specified according to the resolution of your outermost domain. Then it will be calculated for child domain based on

parent_time_step_ratio. For example, if your time_step = 180 and you have parent_time_step_ratio = 1, 3, 3, then time_step for D02 and D03 will be 60 and 20.

What about higher resolutions? For example, if the innermost domain has a resolution of 333 meters or 111 meters, could you suggest the best option for the time step?

Please see my answera above. You don't need to explicitly give time step for child domain.
Additionally, I would appreciate your input on the w_damping parameter. I plan to run WRF in real mode (not ideal) for simulating past events (not operational forecasting). Should I set w_damping to 0 or 1?

Please set w_damping =1, which suppresses vertical motions to make the model numericlally stable.
Thanks in advance.
 
I have a query about the namelist.input settings, specifically smooth_option and smooth_cg_topo. For a complex topography with steepness greater than 70 degrees, is there a difference in activating or deactivating these options in namelist.input? I read a post mentioning that smooth_cg_topo has slight impacts over relatively flat terrain. Since my research focuses on channelized wind, which is significantly affected by topography, would setting smooth_option = 0 and smooth_cg_topo = .false. be better choices for my study?

Another query is about radt. I have chosen dx, dy = 9000, 3000, 1000. What are your thoughts on setting radt = 10, 10, 10?

Sincerely
 
Top