Scheduled Downtime
On Friday 21 April 2023 @ 5pm MT, this website will be down for maintenance and expected to return online the morning of 24 April 2023 at the latest

Segmentation fault and TBOUND exceeds table limit

kyle_r

New member
Hello,

I am having difficulties with getting wrf.exe to run for more than about 1-1.5 hr. At that point it always fails with the message
forrtl: severe (174): SIGSEGV, segmentation fault

Often, I will also get the following message, with the model failing with the segmentation fault occurring shortly thereafter.
rrtm: TBOUND exceeds table limit: reset 395.731

I've gone through a number of forum pages and tried suggestions given to other users, but I continue to get the same result. I was hoping that someone may have some advice on other things I could try.
I am using NARR forcing data with grid resolutions of 9, 3, 1, and 1 km, and there are two innermost domains. I have attached my most recent namelist.input and rsl.error files. The namelist.input settings that I have tried and their results are in the table below. The "fail time" column indicates the model time where it failed, with the longest being able to simulate up to ~45 minutes. Based on my smallest domain and results from the number_of_procs.py file, the highest number of processors I can use 120 (3 nodes, 40 cores each). Any help would be much appreciated.



attempt #
nodes
cores/node
radt
time_step
sf_urban_physics
swint_opt
epssm
fail time
TBOUND error
1
2
40
9
30
1
0
N/A
6:45
Y
2
1
40
9
30
1
0
N/A
6:45
Y
3
1
40
3
30
1
0
N/A
6:45
Y
4
1
40
9,3,1,1
30
1
0
N/A
6:45
Y
5
1
40
9,3,1,1
45
1
0
N/A
6:27
N
6
3
40
9,3,1,1
45
1
0
N/A
6:27
N
7
1
30
9,3,1,1
30
1
0
N/A
6:45
Y
8
1
30
9,3,1,1
30
1
0
0.5
6:36
N
9
3
40
3
30
1
1
0.5
6:16
N
10
3
40
3
30
0
1
0.5
6:18
Y
 

Attachments

  • rsl.error.0000
    332.5 KB · Views: 1
  • namelist.input
    5.2 KB · Views: 1
Last edited:
Hi,
Although you said you are using a variety of numbers of processors, according to your rsl* file (near the top of the file), you are only using a single processor.

Code:
 Ntasks in X            1, ntasks in Y            1

I'm not sure what the discrepancy is between how you intend to run it, and why it's only utilizing 1 core, but this could very well be playing a part in the model stopping, due to the size of your domains. If you compiled with the code configured for dmpar (distributed memory) and you feel certain you're correctly asking for the desired number of processors, I would recommend talking to a systems administrator at your institution to determine what the issue is.
 
Top