Segmentation fault and TBOUND exceeds table limit

kyle_r · May 17, 2023

Hello,

I am having difficulties with getting wrf.exe to run for more than about 1-1.5 hr. At that point it always fails with the message
forrtl: severe (174): SIGSEGV, segmentation fault

Often, I will also get the following message, with the model failing with the segmentation fault occurring shortly thereafter.
rrtm: TBOUND exceeds table limit: reset 395.731

I've gone through a number of forum pages and tried suggestions given to other users, but I continue to get the same result. I was hoping that someone may have some advice on other things I could try.
I am using NARR forcing data with grid resolutions of 9, 3, 1, and 1 km, and there are two innermost domains. I have attached my most recent namelist.input and rsl.error files. The namelist.input settings that I have tried and their results are in the table below. The "fail time" column indicates the model time where it failed, with the longest being able to simulate up to ~45 minutes. Based on my smallest domain and results from the number_of_procs.py file, the highest number of processors I can use 120 (3 nodes, 40 cores each). Any help would be much appreciated.

attempt #	nodes	cores/node	radt	time_step	sf_urban_physics	swint_opt	epssm	fail time	TBOUND error
1	2	40	9	30	1	0	N/A	6:45	Y
2	1	40	9	30	1	0	N/A	6:45	Y
3	1	40	3	30	1	0	N/A	6:45	Y
4	1	40	9,3,1,1	30	1	0	N/A	6:45	Y
5	1	40	9,3,1,1	45	1	0	N/A	6:27	N
6	3	40	9,3,1,1	45	1	0	N/A	6:27	N
7	1	30	9,3,1,1	30	1	0	N/A	6:45	Y
8	1	30	9,3,1,1	30	1	0	0.5	6:36	N
9	3	40	3	30	1	1	0.5	6:16	N
10	3	40	3	30	0	1	0.5	6:18	Y

kwerner · May 18, 2023

Hi,
Although you said you are using a variety of numbers of processors, according to your rsl* file (near the top of the file), you are only using a single processor.

Code:

 Ntasks in X            1, ntasks in Y            1

I'm not sure what the discrepancy is between how you intend to run it, and why it's only utilizing 1 core, but this could very well be playing a part in the model stopping, due to the size of your domains. If you compiled with the code configured for dmpar (distributed memory) and you feel certain you're correctly asking for the desired number of processors, I would recommend talking to a systems administrator at your institution to determine what the issue is.

Segmentation fault and TBOUND exceeds table limit

kyle_r

New member

Attachments

kwerner

Administrator