Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

SIGSEGV, Segmentation fault occurred while running WRF 4.4 for nested real case

6575

New member
Greeting all,

I have this problem while running nested real case simulation with WRF(V4.4) frequently. The segmentation fault often occurs about 1 day after model initialization, there is usually no other useful information( such as CFL) in the rsl files, or sometimes there is only a CFL message just before the segmentation fault and the CFL number is extremely large( for example, d02 2023-09-07_04:26:30 65 points exceeded v_cfl = 2 in domain d02 at time 2023-09-07_04:26:30 hours d02 2023-09-07_04:26:30 Max W: 497 432 2 W: 6354.31 w-cfl: 48.38 dETA: 0.01 in a previous try, note that this is the only CFL message). Reducing the time step also do not work. Last year, I faced with similar problem while running an idealized tropical cyclone case and resolved the problem by modifying the dynamic option in namelist.input( SIGSEGV, Segmentation fault occurred while running WRF 4.4 for a nested, idealized tropical cyclone case). But it seems that this method do not work in real case simulation. This problem tortures me a lot as I often run WRF in my recent research work. Previously, I ask my classmates for help, it is strange that they can usually successfully run the simulation with the same data input to WPS and very similar( but not exactly the same ) setting in namelist on the same HPC platform of our school. But as we should not ask others for help all the time, I want to figure out exactly what goes wrong with me. I attach the namelist of my latest try.

I need your help and I appreciate you taking the time to read and try to sort this out for me. I am an undergraduate student majored in atmospheric science in China, so I think my English may be poor. Apologies for the inconvenience.

Thanks in advance.
 

Attachments

  • namelist.input
    4.1 KB · Views: 4
  • namelist.wps
    762 bytes · Views: 1
Hi, I am experiencing similar issues with invalid memory references. I noticed that your time step of 424 is quite large. I recommend starting with a smaller time step, calculated as dt = 6 * dx(where dx is in kilometers). For instance, for a domain with a 9 km grid spacing, the time step should be at least 6 * 9 = 54. For other domains, such as those with 3 km and 1 km grid spacing, you should use the same ratio to determine the appropriate time step. This adjustment should help mitigate.

Tell me if it's works !

Vazquez Ballesta Manuarii
 
Hi, I am experiencing similar issues with invalid memory references. I noticed that your time step of 424 is quite large. I recommend starting with a smaller time step, calculated as dt = 6 * dx(where dx is in kilometers). For instance, for a domain with a 9 km grid spacing, the time step should be at least 6 * 9 = 54. For other domains, such as those with 3 km and 1 km grid spacing, you should use the same ratio to determine the appropriate time step. This adjustment should help mitigate.

Tell me if it's works !

Vazquez Ballesta Manuarii
My timestep is 24s actually, I add a "4" by mistake before uploading the namelist, I have also tried 45s before, but the error occurred so I reduce the timestep.
 
My timestep is 24s actually, I add a "4" by mistake before uploading the namelist, I have also tried 45s before, but the error occurred so I reduce the timestep.
It's surprising that you're still having issues with a time step of 24 seconds. Have you tried reducing it to 10 seconds? That worked for me, although my domain resolutions might differ from yours.
 
If you can upload all your rsl.error and rsl.out files that will help diagnosis it.
This is the only rsl file that contains error message( in other rsl files, there is "forrtl: error (78): process killed (SIGTERM)", I can also see nothing abnormal in other files). Reducing time step, setting w_damping=1 and modifying epssm seldom works when I face error like this, they may only make the model run a few hours longer. Also, when I ask others for help as I have mentioned above, they can run it successfully with the time step of 5×dx, so I want to figure out what goes wrong with me.
 

Attachments

  • rsl.error.0049.txt
    37.1 KB · Views: 2
Last edited:
@6575,
It's difficult to determine what the problem is for each case you run that runs into issues, but we can focus on this particular case.

1) You mentioned that your classmates are able to run this case with an "almost" identical namelist with a time_step of 5xDX. Are you able to obtain their namelist and share it so that we can see what the differences are?

2) Are you using the same version of WRF and compilers?

3) Were there any modifications to the code you're using?

A couple of recommendations are:

1) Try using more processors. With your domain sizes you could use up to about 1500 total, without issues.
2) Try setting your time_step to a number than can be evenly divided into an hour - so maybe something like 45 or 30, instead of 42.
 
@6575,
It's difficult to determine what the problem is for each case you run that runs into issues, but we can focus on this particular case.

1) You mentioned that your classmates are able to run this case with an "almost" identical namelist with a time_step of 5xDX. Are you able to obtain their namelist and share it so that we can see what the differences are?

2) Are you using the same version of WRF and compilers?

3) Were there any modifications to the code you're using?

A couple of recommendations are:

1) Try using more processors. With your domain sizes you could use up to about 1500 total, without issues.
2) Try setting your time_step to a number than can be evenly divided into an hour - so maybe something like 45 or 30, instead of 42.
All of us use the same version of WRF and compiler on the same HPC platform and the WRF source code we use are not modified. We are only allowed to use about 100 processors at most on this HPC platform. I have mentioned that my classmates can run other cases successfully( although the result differs a lot with observation), but not this case. As it is about 1 year ago, I have lost my namelist.wps, I can upload other files.
 

Attachments

  • namelist.wps
    683 bytes · Views: 0
  • namelist.input
    3.6 KB · Views: 0
  • my_namelist.input
    3.7 KB · Views: 0
Last edited:
For this case, my supervisor( I will study for a master's degree in the next 3 years.) have run it successfully with the following namelist and GDAS data. She give her namelist to me and I have done some testing. I run the model with ERA5 data as I did previously. First, I input the wrfinput and wrfbdy file generated previously to the model, but I use her namelist.input( e_vert is modified to match my input file), segmentation fault still occurred after about 1 day. Then I use her namelist.wps and namelist.input to run the model, it is successfully completed. After that, I use my map_proj, ref_lat and truelat1 in namelist.wps to re-run WPS and WRF, it also completed successfully. So I want to find out exactly what went wrong as I will probably use WRF a lot in the future.
 

Attachments

  • namelist.input
    5.8 KB · Views: 0
  • namelist.wps
    902 bytes · Views: 0
Top