Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

solution of "segmentation fault" for high resolution simulations over India

ssk351

New member
Dear WRF Team,


I hope this email finds you well. My name is Saikrishna, and I am currently engaged in the simulation of heavy rainfall events over India using the WRF model. My objective is to analyze both local-scale and large-scale features during these events.

In my endeavor, I have attempted to simulate the events utilizing both single-domain (with resolutions of 3km, 1.5km, and 1km) and nested-domain setups (with resolutions of 6x3km and 2.5x1.5km). Encouragingly, the simulation with a single domain (4.5km resolution) proved successful.

To ensure accuracy, I meticulously designed the domain boundaries to avoid any coincidences with complex topographical features. However, upon attempting resolutions finer than 4.5km, I encountered a recurring "Segmentation fault" error.

After thorough research, I identified several potential causes for this error, including:

  • Improper processor allocation or excessive processor utilization during decomposition
  • Storage and memory constraints (addressed by setting [ulimit -s unlimited])
  • Input data inaccuracies
  • CFL (Courant-Friedrichs-Lewy) instability issues, potentially mitigated by adjusting the time step
  • Parameter adjustments such as Etac values (0.0001, 0.01, 0.02, and 0.05) and Epssm range (0.2 – 0.9)
In light of these challenges, I seek your expertise in proposing viable solutions to overcome this error and achieve high-resolution domains (ranging from 3km to 0.5 meters) covering India.

Your guidance and insights would be immensely valuable in advancing my research. I eagerly await your response.

Thank you for your time and consideration.
 

Attachments

  • domain1.png
    domain1.png
    197.3 KB · Views: 9
  • domain2.png
    domain2.png
    163.3 KB · Views: 10
  • domain3.png
    domain3.png
    181.1 KB · Views: 9
Hi,
Apologies for the delay. Can you let me know what is the resolution of your input data (that you feed into the ungrib program)? Can you also please attach the namelist.input file for one of the failed cases, along with all of the rsl* files? Just package up your rsl* files into a single *.tar file and attach that, along with the namelist. Thanks!
 
Dear WRF Team,


I hope this email finds you well. My name is Saikrishna, and I am currently engaged in the simulation of heavy rainfall events over India using the WRF model. My objective is to analyze both local-scale and large-scale features during these events.

In my endeavor, I have attempted to simulate the events utilizing both single-domain (with resolutions of 3km, 1.5km, and 1km) and nested-domain setups (with resolutions of 6x3km and 2.5x1.5km). Encouragingly, the simulation with a single domain (4.5km resolution) proved successful.

To ensure accuracy, I meticulously designed the domain boundaries to avoid any coincidences with complex topographical features. However, upon attempting resolutions finer than 4.5km, I encountered a recurring "Segmentation fault" error.

After thorough research, I identified several potential causes for this error, including:

  • Improper processor allocation or excessive processor utilization during decomposition
  • Storage and memory constraints (addressed by setting [ulimit -s unlimited])
  • Input data inaccuracies
  • CFL (Courant-Friedrichs-Lewy) instability issues, potentially mitigated by adjusting the time step
  • Parameter adjustments such as Etac values (0.0001, 0.01, 0.02, and 0.05) and Epssm range (0.2 – 0.9)
In light of these challenges, I seek your expertise in proposing viable solutions to overcome this error and achieve high-resolution domains (ranging from 3km to 0.5 meters) covering India.

Your guidance and insights would be immensely valuable in advancing my research. I eagerly await your response.

Thank you for your time and consideration.
@ssk351
Just going to add some ideas to try after you do what kwerner asked.

Some settings that helped me deal with the mountains were:

epssm = 0.9,
etac = 0.02
w_damping =1
diff_opt = 2
km_opt = 4

Just something that I have used that might work. Again @kwerner is the expert I'm just a user.
 
Hi,
Apologies for the delay. Can you let me know what is the resolution of your input data (that you feed into the ungrib program)? Can you also please attach the namelist.input file for one of the failed cases, along with all of the rsl* files? Just package up your rsl* files into a single *.tar file and attach that, along with the namelist. Thanks!
Hi Kwerner,
Thanks for the response
Here, I have attached the rslfiles and namelist.input.
 

Attachments

  • rsl.file.tar.gz
    6.4 KB · Views: 3
  • namelist.input
    4.8 KB · Views: 14
@ssk351
Just going to add some ideas to try after you do what kwerner asked.

Some settings that helped me deal with the mountains were:

epssm = 0.9,
etac = 0.02
w_damping =1
diff_opt = 2
km_opt = 4

Just something that I have used that might work. Again @kwerner is the expert I'm just a user.
Thanks for your reply.
Yes, I agreed with you. But I have tried those options too.
 
Hi,
First, I'd like to apologize for the even longer delay in response this time. We've been busy trying to prepare for our upcoming code release and have gotten quite behind on forum posts. I think the issue is almost certainly due to the size of your domain, vs the number of processors you're using. Your domain is very large (1800 x 1600) and you will need to use a lot more than 48 processors for that. See Choosing an Appropriate Number of Processors.

Another issue may be the fact that your domain is set to a resolution of 3km. Depending on what the resolution is of your input data, this may be too high of a resolution. The grid-spacing of your WRF domain should be no more than about a 5:1 ratio to the input data.
 
Hi
Thanks for your support.
1. Yes, It could be because of the processors decomposition. So I have tried (already) with different combination of processors (256,384,480,576, 720,1440 and 1920) but no luck, and showing the same error.
2. Considering the impact of resolution, our input data's resolution stands at 25km. Logically, the simulation resolution should not exceed 5km. However, it's worth noting that a simulation at 4.5km was executed successfully

Your insights will be invaluable in refining our approach and resolving this issue effectively.
 
Hi
Thanks for your support.
1. Yes, It could be because of the processors decomposition. So I have tried (already) with different combination of processors (256,384,480,576, 720,1440 and 1920) but no luck, and showing the same error.
2. Considering the impact of resolution, our input data's resolution stands at 25km. Logically, the simulation resolution should not exceed 5km. However, it's worth noting that a simulation at 4.5km was executed successfully

Your insights will be invaluable in refining our approach and resolving this issue effectively.
I think the rule of 5:1 is not Very strict!
 
@ssk351
1) When you tried using a lot more processors (e.g., 1920), did the model stop after only a couple of time steps, like it did when you only used 48 processors?
2) As @haiqingsong points out, the 5:1 ratio rule is not exact. It's an estimated ratio that you should try to stay within. Although you were able to successfully run a 4.5km simulation, it's still quite possible you will need a parent domain around your 3km, 1.5km, and 1km domains. Can you give that a try and see if it helps any? For e.g., you could test the 3km simulation, with a 9km parent surrounding it, using a 3:1 parent_grid_ratio.
 
1) Yes, It was crashing the model after few times with different no.of processors (48,196,256,384,480,576, 720,1440 and 1920)
2) I have tried 2-Nested (4.5 x 1.5 km) and crashing after few time steps. But I have not tried the combination i.e. 9 x 3 that you are suggesting.
 
Would be good to know the outcome, since I think i am experiencing something similar.
Very much appreciated
 
Top