Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Segmentation fault: CFL error

samudra

New member
I have been trying to run wrf.exe using mpi on 2 nodes with 24 ppn. I have attached namelist.input and rsl.error files here.
The issue is coming from the CFL error condition, where CFL values exceed 2 at certain grid points. I tried implementing the solutions posted in a few threads, like reducing time_step to 4XDX or 3XDX, or introducing certain variables to smooth topography. I have also tried all solution based on this thread - Segmentation Faults and CFL Errors . I have also tried using the adaptive time step, but it did not work. Kindly let me know what could be done differently.
 

Attachments

  • namelist.input
    4.5 KB · Views: 5
  • rsl.error.0046.txt
    15.3 KB · Views: 2
Last edited:
Your namelist.input looks fine. However, CFL violation occurred almost right after the model started, indicating that the input data may be not correct.

What is the data you used to drive this case? Did you run tc.exe before moving on to run wrf.exe?
 
Hi,
I used ERA5 reanalysis data downloaded from RDA website. I have run some cases earlier for one nested domain using the same dataset and configuration for 8-10 days, but for different dates and times, and it ran successfully. Regarding the tc.exe, actually I am not running it for cyclone cases specific but wanted to give a run for a long time, lets say for a month, but it has been there in the namelist file from earlier runs. Is that causing a problem for this?
 
NCAR RDA has updated ERA5 to netCDF format. Are you using the updated data or the old data in GRIB format?
 
I have repeated your case, using your namelist.wps and namelist.input. The only changes I made are that I set the lateral boundary interval to be 6-hr, and specify the start and end time in namelist. This case is done successfully ( I run WRF for 6 hours). Please see the attached namelist.input and namelist.wps.

Can you try with the namelist files I modified, and hopefully you can get it done. Then you can move on to run your specific case.

Let me know if you still have any issues.
 

Attachments

  • namelist.wps.txt
    1.5 KB · Views: 4
  • namelist.input.txt
    4.5 KB · Views: 6
Hi,
Based on your suggestions, the model ran smoothly. Thanks!!
But I have a general doubt regarding a run. Let's say I want to run my model for a month over a certain specific domain(for now assuming a single domain case). I initialise the model at some time t=0 by giving an input file but that is the only input condition that I specify, and let the model evolve over that month. Is such kind of a run possible? If so, then what are the changes that had to be implemented in the namelist files. I am guessing if that is possible then the start_date and end_date have to be same to indicate that only one input is being given.
 
Last edited:
Thanks for the update. I am glad the case works for you!

Regrading your month-lomg simulation, it is completely feasible to run WRF over lomng periods like months or even years. The only option you need to add is sst_update, which updates SST to make it reasonable over a long time span.

The start and end time should be specified, sothat WRF knows over what period to integrate.
 
Hi, couple of questions more 😬
1) By start and end time, you are referring to making changes in both of the namelist files, right?
2) Also, when I specify the start and end time, does it require to have the input files for the same intervals as specified in the namelist files. I tried giving a monthly run but gave only one input file (at the starting date) but it shows error while running the metgrid.exe.
 
Hi, couple of questions more 😬
1) By start and end time, you are referring to making changes in both of the namelist files, right?
Yes this is correct
2) Also, when I specify the start and end time, does it require to have the input files for the same intervals as specified in the namelist files. I tried giving a monthly run but gave only one input file (at the starting date) but it shows error while running the metgrid.exe.
The input data should cover the entire period of your integration. This is because WRF is a regional model and it requires frequent lateral forcing.

Please read the WRF USER's Guide and learn more about the modeling system.
 
Hi Ming,
I gave a run for 10 days and it ran smoothly. Then I tried a one-and-a-half-month-long run for a particular year, along with the sst update option as suggested in previous queries. The sst files have been already downloaded along with the surface input files from the RDA website. I have attached my namelist.input file. The model runs smoothly for around 12 days, and then it suddenly shuts down. I checked my rsl.error files and none of it shows any error that might help solve the problem(Have attached one rsl.error file). Can you look into the issue and give some suggestions.
 

Attachments

  • rsl.error.0000
    652.7 KB · Views: 2
  • namelist.input
    4.6 KB · Views: 2
There is no error message in your rsl.error.0000 file. Can you look at your other rsl files and find possible error messages? The error messages are important for us to figure out what is wrong. Note that such messages can be randomly distributed in any rsl files.
 
Hi, I gave the run using 72 processors and hence found errors in 4/72 processors. I am attaching the error files here. Seems like its a segmentation fault error as shown below -
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0 0x7fc60501a171 in ???
#1 0x7fc605019313 in ???
#2 0x7fc6044a9acf in ???
#3 0x202977b in ???
#4 0x204ca47 in ???
#5 0x2072538 in ???
#6 0x208200c in ???
#7 0x1a1368d in ???
#8 0x1b1f569 in ???
#9 0x15ac73a in ???
#10 0x1441dd3 in ???
#11 0x486b95 in ???
#12 0x404e61 in ???
#13 0x40491c in ???
#14 0x7fc604495ca2 in ???
#15 0x40495d in ???
#16 0xffffffffffffffff in ???

I am not able to understand the reason behind this error after running for 12 days. Kindly look into it.
Thanks !!
 

Attachments

  • rsl.error.0053.txt
    8.3 KB · Views: 0
  • rsl.error.0054.txt
    8.3 KB · Views: 2
  • rsl.error.0061.txt
    8.3 KB · Views: 2
  • rsl.error.0062.txt
    8.4 KB · Views: 0
In your rsl files, I can only find the error message "Segmentation fault - invalid memory reference". Since your case failed after 12 days of integration, I am suspicious that something went wrong in physics/dynamics. I would suggest that:

(1) you may need to look at wrfout files before the model crashed, and see whether you can find some weird or physically unreasonable variables

(2) please recompile WRF in debug mode, then rerun your case. The log file will tell exactly when and where the model crashed first. This will give you some hints what is wrong.
 
Top