Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Segmentation fault

Shaivi Shukla

New member
When I am running for up to the first three domains, it's working fine. However, when I add the fourth domain, I initially received a CFL error. I resolved it based on the queries available in the forum because my study area is near steep terrain. But even after incorporating that solution, I am still getting a segmentation fault error. I have also tried varying the number of processors.
 

Attachments

  • Files.zip
    15.2 KB · Views: 6
Hi,
First I'd like to apologize for the delay in response. Our support staff have been out of the office much of this past month. Are you still experiencing this issue? If so, can you attach all of your rsl files (packed in a zipped or .tar file) so we can take a look? Also, when you ran with only 3 domains, was everything identical to the namelist you attached, but just using max_dom = 3, instead of 4?
 
Yes everything was same when I was running for the 3 domains, (But in 3 domain also after running for 25 to 30 days, it stops generating the output, but the WRF job was still running (but not generating any output) and at the end after running for the duration specified in the batch script it gives time error. So, in that case I used the restart option to get the result for the complete time period. ).

For the 4 domain , I run the model by making the following changes:

1. Namelist file having input similar to that when running for the 3 domain and only changing max_dom = 4, instead of 3 : In this case the model crashes immediately.

2. By making few changes in the namelist file , as suggested in the forum ( With the feedback mechanism turned on) : In that case also it crashed

3. With the inputs similar to that of the case 2 , but only the feedback mechanism was turned off: In this case my model gave output for 3 days , and after that it stopped giving the output but the job still runs uptill the time mentioned in the job submission script.
 

Attachments

  • Feedback mechanish turned on.zip
    142.1 KB · Views: 1
  • Feedback mechanism turned off.zip
    3.1 MB · Views: 1
  • inputs similar to 3 domain.zip
    94.3 KB · Views: 1
Last edited:
Thanks for that information and for sending those files. It looks like when you have feedback on, and with the case with input similar to that of the 3 domain case, CFL errors are showing up in the rsl* files (you can find these by issuing "grep cfl rsl*"). See What is the most common reason for a segmentation fault?, which discusses ways to try to overcome CFL errors.

As for the case that runs longer, but stops giving output, I'm not sure why this is happening, but the fact that the rsl* files don't show that it was successful means it probably is not actually running anymore, or that it's hanging until you run out of wallclock time.

This won't fix the CFL errors, but if you have more processors available to use, you may want to try using more. You could try using something like 36 or 64.
 
I had already incorporated the suggestions mentioned in What is the most common reason for a segmentation fault? , but nothings resolved my error. I had tried using various number of processors ranging from 8 to 384 , but nothing worked. My child domain is over the hilly terrain, I had also tried to incorporate the suggestions mentioned in the forum related to the complex terrain also. But still not able to resolve the problem.
 
@Shaivi Shukla

Have you tried adding etac to your dynamics values?

Code:
etac                                = 0.02

Looking at your namelist.wps (Below)
It looks like you are doing a very large 27km domain then 9km 3km 1km.

I'd double check with @kwerner but perhaps you can adjust your domains for size?

Screenshot from 2024-07-10 04-27-03.png
 
@William.Hatheway

After your reply, I even tried adding etac = 0.02, but then also same problem is coming.

And regarding the 1st domain size, my model is running well if I run for the outer 3 domain (i.e., excluding the 1km child domain). Then I had tried to run my model just for the inner 3 domain (i.e., 9km , 3km, 1km) in that case my model is crashing . So I don't think that the size of 27km domain is causing any problem, The problem is coming when I add the 1km child domain.

Your expertise and insights would greatly benefit my research. I look forward to your response.

Thank you for your time and consideration.
 
Unfortunately the area of your d04 has very complex terrain and it's always tough to keep the model stable there. I see that you tried the suggestions, such as adding epssm, smooth_cg_topo, and w_damping in your namelist. Did you try to reduce the time_step significantly? For e.g., try setting it to something like 90 or even less if necessary.
 
Unfortunately the area of your d04 has very complex terrain and it's always tough to keep the model stable there. I see that you tried the suggestions, such as adding epssm, smooth_cg_topo, and w_damping in your namelist. Did you try to reduce the time_step significantly? For e.g., try setting it to something like 90 or even less if necessary.
Yes sir, I had tried to run with various time_step (i.e, 150, 120, 90, 60 etc) but nothing worked.
 
Hi,
Okay here are a few other things you can try.

1. Rerun real with these parameters for a different lowest layer thickness (these parameters will generate a smooth varying vertical layers):

Code:
e_vert                              = 51,
 dzbot                               = 50.
 dzstretch_s                         = 1.11
 dzstretch_u                         = 1.06
 p_top_requested                     = 5000,

or

Code:
e_vert                              = 51,
 dzbot                               = 30.
 dzstretch_s                         = 1.11
 dzstretch_u                         = 1.08
 p_top_requested                     = 5000,

2. Use diff_opt = 1 rather than 2
3. Set etac = 0.15 or 0.1 - the default value is 0.2, and move the model top higher to 20 mb or 10 mb. Note this must be done prior to running real.exe. This may need different above parameters.
4. Try to move the inner domain 3 and 4 somewhat by changing the starting indices. This may help if the blow up happens near the domain boundary. Without knowing where it happens, it's difficult to know.
 
Top